No existing benchmark measured whether AI agents can find real API bugs from a schema and payload alone
100+ downloads in first week by developers and contributors; freely available on HuggingFace
KushoAI has run its own agent against the benchmark; head-to-head comparison report in development

SAN FRANCISCO, April 2, 2026 /PRNewswire/ — KushoAI, an AI-native API testing platform used by 30,000+ engineers across 6,000+ enterprises and high-growth technology companies, today released APIEval-20, an open benchmark for evaluating whether AI agents can generate tests that catch real bugs in APIs given only a request schema and sample payload: no source code, no documentation, no additional context.

Analysis of 1.4 million AI-driven test executions across 2,616 organizations shows that authentication failures account for 34% of API outages and 41% of APIs experience undocumented schema changes within 30 days, yet no standard existed for measuring whether AI agents could detect these failures systematically. APIEval-20 extends the benchmark tradition established by HumanEval for code generation and SWE-bench for bug fixing, applying the same rigour to API testing.

Abhishek Saikia, Co-Founder & CEO, KushoAI, said, “Every vendor selling AI-powered API testing uses the same language: schema validation, payload fuzzing, bug detection. There has been no shared reference point for what any of that means in practice. APIEval-20 gives the field a concrete, reproducible measure of whether an AI agent thinks like a QA engineer.”

A Head of Engineering at a Fortune 500 financial services company noted in feedback to KushoAI that they had been evaluating AI testing tools for the past year and consistently ran into the challenge of comparing them objectively. They highlighted that APIEval-20 is the first framework they have seen that directly addresses this gap, surfacing shortcomings in agent reasoning that are not visible in demo environments.

Key Benchmark Details

20 scenarios across payments, authentication, e-commerce, scheduling, user management, notifications, and search. Each contains 3 to 8 planted bugs across simple, moderate, and complex tiers.
Binary evaluation against live reference implementations. Scoring weights bug detection at 70%, coverage at 20%, and efficiency at 10%.

Benchmark Report: resources.kusho.ai/api-eval-20

Dataset: huggingface.co/datasets/kusho-ai/api-eval-20

About KushoAI

KushoAI is an AI-native API testing and software reliability platform. Used by 30,000+ engineers across 6,000+ organizations, backed by Antler and Blume Ventures. Visit kusho.ai.

Logo: https://mma.prnewswire.com/media/2948973/KushoAI_Logo.jpg

View original content to download multimedia:https://www.prnewswire.com/news-releases/kushoai-launches-apieval-20-the-first-open-benchmark-for-ai-api-test-generation-302732888.html