OpenAI’s AI Agents Achieve 72% Success in Exploiting Real-World Smart Contract Vulnerabilities

By cgppy

February 21, 2026 2 Min Read

47 0

New benchmark from OpenAI and Paradigm demonstrates rapid advances in AI’s ability to identify and exploit DeFi flaws, outpacing detection and patching.
What to know:

GPT-5.3-Codex exploited 72.2% of 120 high-severity vulnerabilities in a new EVMbench test, more than doubling the 31.9% rate of its predecessor GPT-5 in just six months.
While exploit capabilities surge, AI agents lag in detection (partial recall) and patching (below full coverage), highlighting an imbalance favoring offensive uses.
The findings amplify warnings from firms like Cecuro, where specialized defensive AI detected 92% of exploited DeFi contracts, amid fears that AI is supercharging crypto hacks at low cost.

OpenAI, in partnership with crypto VC firm Paradigm, has unveiled EVMbench, a groundbreaking benchmark evaluating AI agents’ prowess in handling smart contract vulnerabilities—spanning detection, patching, and exploitation. Released on Wednesday, the tool assesses agents across 120 curated high-severity flaws drawn from 40 real-world audits, primarily from open code competitions and Tempo blockchain security reviews. These vulnerabilities collectively safeguard over $100 billion in on-chain assets, underscoring the high stakes for DeFi ecosystems.

In exploit mode, agents simulate end-to-end attacks in a sandboxed environment, draining funds via transaction replays and on-chain verification. OpenAI’s latest model, GPT-5.3-Codex, scored 72.2% success, a stark leap from GPT-5’s 31.9% just half a year prior, indicating exploit capabilities are doubling approximately every 1.3 months. This acceleration aligns with broader trends, where AI lowers the barrier for large-scale scanning, with average exploit attempts costing as little as $1.22 per contract.

However, performance in detection and patching modes reveals gaps: Agents often identify only partial vulnerabilities and struggle to fix them without disrupting functionality. Detection relies on recall of ground-truth issues, while patching requires preserving contract behavior amid automated tests. Experts note that clear objectives like “drain funds” favor exploits, whereas nuanced tasks like auditing demand domain-specific heuristics—echoing findings from Cecuro’s recent study, where a specialized AI outperformed general models by detecting 92% of 90 exploited DeFi contracts versus 34% for a GPT-5.1 baseline.

The benchmark arrives as AI’s dual-use nature intensifies crypto security debates. Chainalysis’ 2026 Crypto Crime Report highlights AI-enabled scams yielding 4.5 times more revenue than traditional ones, with $17 billion stolen in 2025 alone. North Korean hackers and others are leveraging AI for automated exploits, while DeFi losses from hacks exceeded $2.3 billion in a single year. Forbes Council experts urge DeFi teams to shift from passive audits to active defenses, embedding AI in CI/CD pipelines for continuous monitoring, anomaly detection, and circuit breakers to halt exploits in real-time.

EVMbench has been open-sourced on GitHub to track evolving risks and promote AI-assisted auditing. Yet, as offensive capabilities outpace defensive adoption, industry voices like those on X warn of an “arms race” in Web3 security, with agents potentially manipulating markets through MEV extraction or front-running. OpenAI emphasizes responsible use, advocating for benchmarks like this to bolster defenses before vulnerabilities turn into billion-dollar breaches.

Tags:

AI Chat GPT DeFi

where the crypto community helps you stay safe and smart. We cut through the noise with Honest scam warnings, real user experiences, and reviews written by experts.

where the crypto community helps you stay safe and smart. We cut through the noise with Honest scam warnings, real user experiences, and reviews written by experts.

OpenAI’s AI Agents Achieve 72% Success in Exploiting Real-World Smart Contract Vulnerabilities

Tags:

Other Articles

Political Turmoil from Tariff Ruling Poses Risks to Crypto Industry’s Legislative Agenda

CFTC Expands Stablecoin Issuer Rules, Paving Way for National Banks in $150 Billion Market

Popular Posts

Buy the Dip Strategy: How to Buy Crypto in 2026

RWA Tokenized Asset Investing in 2026: Stocks, Real Estate and Bonds on Blockchain

Crypto Arbitrage in 2026: Low-Risk Profits From Exchange Price Gaps

Trend Following and Breakout Trading to Ride Big Crypto Pumps

Categories

Related Posts

How to Find and Revoke Dangerous Token Approvals

How Scammers Target Your Seed Phrase in 2026

MetaMask Review: Features, Security, and Quick Summary

Categories

About us

Newsletter

Follow Us On Socials

Type and hit Enter to search

where the crypto community helps you stay safe and smart. We cut through the noise with Honest scam warnings, real user experiences, and reviews written by experts.

Type and hit Enter to search

where the crypto community helps you stay safe and smart. We cut through the noise with Honest scam warnings, real user experiences, and reviews written by experts.

OpenAI’s AI Agents Achieve 72% Success in Exploiting Real-World Smart Contract Vulnerabilities

Tags:

Share Article

Other Articles

Popular Posts

Categories

Related Posts

Categories

About us

Newsletter