Particle.news

OpenAI and Paradigm Release EVMbench as Agentic Crypto Infrastructure Accelerates

The release measures AI on real Ethereum bugs, highlighting fast‑rising exploit skills that promise stronger defenses yet widen the threat surface.

Overview

  • EVMbench evaluates AI agents on detecting, patching, and exploiting Ethereum smart contract vulnerabilities using a standardized, open framework.
  • The benchmark draws on 120 high‑severity issues from 40 real‑world audits, including scenarios from Stripe’s Tempo security review.
  • Paradigm reports top models’ exploit success rose from under 20% at project start to above 70%, indicating rapid capability gains.
  • OpenAI expanded the private beta of its Aardvark security research agent and committed $10 million in API credits to support defensive crypto research.
  • Parallel efforts push agent autonomy: Sigil Wen’s Conway equips agents with wallets, x402 pay‑for‑compute and deployment tools, the Automaton reference agent demonstrates those capabilities, and Dragonfly’s new fund cites agentic payments as a core thesis, while observers flag significant dual‑use risk.