OpenAI Tests AI Models for Smart Contract Security

On February 18 2026, OpenAI and Crypto investment firm Paradigm launched a jointly launched EVMbench, an open-source benchmark that tests how well AI agents can detect, patch and exploit vulnerabilities in Ethereum-based smart contracts.

The timing couldn’t be more perfect because smart contracts currently secure over $100 billion in open-source crypto assets.

What is EVMbench and How Does it Test Smart Contract Security

EVMbench draws on 120 curated vulnerabilities across 40 professional audits, most pulled from Code4rena, a platform where security researchers race to find bugs in live codebases. Until now, no standardized tool existed to measure AI performance in this environment.

To fix that, OpenAI open-sourced the full dataset, tooling, and evaluation harness so developers can consistently test models as AI capabilities evolve.

Specifically, EVMbench tests agents across three modes: Detect (identify vulnerabilities), Patch (fix them without breaking functionality), and Exploit (execute fund-draining attacks inside a sandboxed environment). To keep things safe, the system runs all tests in an isolated environment so no real money is ever at risk.

How OpenAI GPT-5.3-Codex Scores Against Real Blockchain Vulnerabilities

So far, the results show significant progress. In Exploit mode, GPT-5.3-Codex scored 72.2%, up from GPT-5‘s 31.9% just six months earlier. In fact, Paradigm Partner Alpin Yukseloglu noted that when the project started, top models could only exploit less than 20% of critical bugs. Today, that figure sits above 70%.

Nevertheless, detection and patching remain harder problems. Agents frequently stop after finding a single issue and Patch success rates still fall short of full coverage.

Why Blockchain Developers Need AI Security Auditing Tools Right Now

As blockchain adoption accelerates, manual auditing simply cannot keep up. As a result, EVMbench enables developers to run vulnerability sweeps in hours rather than days, freeing human auditors to focus on the most complex edge cases.

Beyond that, OpenAI also committed $10 million in API credits to support defensive cybersecurity research for open-source and critical infrastructure projects.

Still, it is common knowledge that the AI helping defenders is also the AI helping to accelerate cyberattacks. OpenAI acknowledged this directly and is taking an evidence-based approach by accelerating defensive capabilities while putting safeguards in place to slow misuse.

Ultimately, open-sourcing EVMbench means any developer can now test AI models against the same standards that top security researchers use.

What's Hot

SambaNova Just Raised $1 Billion at an $11 Billion Valuation to Challenge Nvidia on AI Inference. The Bet Is That Speed at the Edge Matters More Than Raw Training Power and Enterprises Are Starting to Agree.

SK Hynix Just Raised $26.5 Billion on Nasdaq in One of the Largest U.S. Equity Offerings Any Asian Company Has Ever Completed. The Money Has One Destination and It Is Not Coming Back.

Meta’s Custom AI Chip Iris Is Going Into Production in September. The Company Spending $145 Billion on AI This Year Has Decided the Cost of Depending on Nvidia Is No Longer Worth It.

SambaNova Just Raised $1 Billion at an $11 Billion Valuation to Challenge Nvidia on AI Inference. The Bet Is That Speed at the Edge Matters More Than Raw Training Power and Enterprises Are Starting to Agree.

SK Hynix Just Raised $26.5 Billion on Nasdaq in One of the Largest U.S. Equity Offerings Any Asian Company Has Ever Completed. The Money Has One Destination and It Is Not Coming Back.

Meta’s Custom AI Chip Iris Is Going Into Production in September. The Company Spending $145 Billion on AI This Year Has Decided the Cost of Depending on Nvidia Is No Longer Worth It.

Microsoft Just Cut 4,800 Jobs and Xbox Is Absorbing 3,200 of Them. The Division That Was Already Struggling Just Took the Heaviest Hit in the Company’s Biggest Layoff Round in Years.

Apple Just Locked In Its Most Important Chip Partnership Until 2031. Here Is What the Broadcom Extension Reveals About How Serious Apple Is About Owning Its Own Silicon Future

Market Collapse: What Happened to NFTs?

Quantum Computing Advances Force Coinbase and Institutional Custodians to Rethink Crypto Security

AI Assisted Hacking Groups Target Crypto Firms With Multi-Layered Social Engineering

Global Crypto Regulations Expand as 2026 Begins With New Data Collection Frameworks and National Laws

Coinbase Bets on Stablecoin and On-Chain Growth as Key Market Drivers in 2026 Strategy

AI Has Spent Three Years Getting Smarter for People Who Can Already Afford It. Nokia Just Changed That and the Implications Go Further Than Anyone Is Crediting

AI Has Spent Three Years Getting Smarter for People Who Can Already Afford It. Nokia Just Changed That and the Implications Go Further Than Anyone Is Crediting

Samsung Is About to Show Its Next Foldable Phones and the Market It Is Competing In Has Never Been More Crowded. Here Is What Galaxy Unpacked 2026 Needs to Deliver

Tesla Just Launched Its Robotaxi in Miami with No Safety Monitor Inside the Car

A 15-Year-Old Linux Flaw Just Surfaced With a Near-Perfect Exploit. GhostLock Hands Any Local User Root Access and Can Break Out of Containers. Every Linux System Running Today Is Potentially Exposed.

Accenture Confirmed a Breach After a Hacker Listed 35GB of Its Source Code for Sale. The Company Whose Entire Business Is Securing Others Just Became the Most Embarrassing Cautionary Tale in Cybersecurity.

Researchers Just Documented the First AI-Powered Ransomware That Rewrites Itself Mid-Attack. JadePuffer Does Not Wait to Be Stopped. It Adapts Before You Can.

Hackers Stole 630GB of Apple and Tesla Manufacturing Secrets from Tata Electronics. The Breach Confirms Everything Supply Chain Security Experts Have Warned About for Years.

Apple Just Pushed an Unscheduled Security Update Because AI-Powered Attacks Are Moving Faster Than Its Normal Patch Cycle Can Handle

OpenAI Benchmarks AI Models for Smart Contract Security Testing in Blockchain Applications

SambaNova Just Raised $1 Billion at an $11 Billion Valuation to Challenge Nvidia on AI Inference. The Bet Is That Speed at the Edge Matters More Than Raw Training Power and Enterprises Are Starting to Agree.

SK Hynix Just Raised $26.5 Billion on Nasdaq in One of the Largest U.S. Equity Offerings Any Asian Company Has Ever Completed. The Money Has One Destination and It Is Not Coming Back.

Meta’s Custom AI Chip Iris Is Going Into Production in September. The Company Spending $145 Billion on AI This Year Has Decided the Cost of Depending on Nvidia Is No Longer Worth It.

Coinbase responds to hack: customer impact and official statement

Anthropic Will Use Claude User Chats For Data Training

Cursor AI Hits 1 Million Daily Users. Why Developers Are Switching to This Coding Tool

MIT Study Reveals ChatGPT Impairs Brain Activity & Thinking

SambaNova Just Raised $1 Billion at an $11 Billion Valuation to Challenge Nvidia on AI Inference. The Bet Is That Speed at the Edge Matters More Than Raw Training Power and Enterprises Are Starting to Agree.

SK Hynix Just Raised $26.5 Billion on Nasdaq in One of the Largest U.S. Equity Offerings Any Asian Company Has Ever Completed. The Money Has One Destination and It Is Not Coming Back.

Meta’s Custom AI Chip Iris Is Going Into Production in September. The Company Spending $145 Billion on AI This Year Has Decided the Cost of Depending on Nvidia Is No Longer Worth It.

Microsoft Just Cut 4,800 Jobs and Xbox Is Absorbing 3,200 of Them. The Division That Was Already Struggling Just Took the Heaviest Hit in the Company’s Biggest Layoff Round in Years.

Our Picks

Most Popular

Coinbase responds to hack: customer impact and official statement

Anthropic Will Use Claude User Chats For Data Training

Cursor AI Hits 1 Million Daily Users. Why Developers Are Switching to This Coding Tool

Stay Ahead with Exclusive Updates!

What's Hot

OpenAI Benchmarks AI Models for Smart Contract Security Testing in Blockchain Applications

What is EVMbench and How Does it Test Smart Contract Security

How OpenAI GPT-5.3-Codex Scores Against Real Blockchain Vulnerabilities

Why Blockchain Developers Need AI Security Auditing Tools Right Now

Related Posts