
On February 18 2026, OpenAI and Crypto investment firm Paradigm launched a jointly launched EVMbench, an open-source benchmark that tests how well AI agents can detect, patch and exploit vulnerabilities in Ethereum-based smart contracts.
The timing couldn’t be more perfect because smart contracts currently secure over $100 billion in open-source crypto assets.
What is EVMbench and How Does it Test Smart Contract Security
EVMbench draws on 120 curated vulnerabilities across 40 professional audits, most pulled from Code4rena, a platform where security researchers race to find bugs in live codebases. Until now, no standardized tool existed to measure AI performance in this environment.
To fix that, OpenAI open-sourced the full dataset, tooling, and evaluation harness so developers can consistently test models as AI capabilities evolve.
Specifically, EVMbench tests agents across three modes: Detect (identify vulnerabilities), Patch (fix them without breaking functionality), and Exploit (execute fund-draining attacks inside a sandboxed environment). To keep things safe, the system runs all tests in an isolated environment so no real money is ever at risk.
How OpenAI GPT-5.3-Codex Scores Against Real Blockchain Vulnerabilities
So far, the results show significant progress. In Exploit mode, GPT-5.3-Codex scored 72.2%, up from GPT-5‘s 31.9% just six months earlier. In fact, Paradigm Partner Alpin Yukseloglu noted that when the project started, top models could only exploit less than 20% of critical bugs. Today, that figure sits above 70%.
Nevertheless, detection and patching remain harder problems. Agents frequently stop after finding a single issue and Patch success rates still fall short of full coverage.
Why Blockchain Developers Need AI Security Auditing Tools Right Now
As blockchain adoption accelerates, manual auditing simply cannot keep up. As a result, EVMbench enables developers to run vulnerability sweeps in hours rather than days, freeing human auditors to focus on the most complex edge cases.
Beyond that, OpenAI also committed $10 million in API credits to support defensive cybersecurity research for open-source and critical infrastructure projects.
Still, it is common knowledge that the AI helping defenders is also the AI helping to accelerate cyberattacks. OpenAI acknowledged this directly and is taking an evidence-based approach by accelerating defensive capabilities while putting safeguards in place to slow misuse.
Ultimately, open-sourcing EVMbench means any developer can now test AI models against the same standards that top security researchers use.
