Ai Benchmarks for Code

Endor Labs Launches Agentic Code Security Benchmark, Finds Top-Performing AI Coding Agents Pass Tests But Still Fail Security

Endor Labs, today announced the launch of the agentic code security benchmark, extending the existing SusVibes framework from leading academic researchers to evaluate how securely AI coding agents ...

Morning Overview on MSN

Survey: 43% of AI-generated code changes need production debugging

Nearly half of the code that AI assistants write for software teams breaks once it hits real users. That is the central ...

Analytics India Magazine

New Research Finds Seven ‘Deadly’ Vulnerabilities in AI Benchmarks

A team of researchers from UC Berkeley have demonstrated that eight AI agent benchmarks can be manipulated to produce ...

KushoAI Unveils APIEval-20 to Benchmark AI Agents in API Testing

SAN FRANCISCO, April 8, 2026 /PRNewswire/ -- KushoAI, an AI-native platform for API testing and software reliability, has introduced APIEval-20, an open benchmark designed to evaluate how effectively ...

Stanford’s AI Report Card: Agents Are Ready. Companies Are Not.

Stanford's 2026 AI Index: agents approach human performance, $582B invested, entry-level jobs vanish. The technology is ready ...

TMCnet

KushoAI Launches APIEval-20, the First Open Benchmark for AI API Test Generation

KushoAI, an AI-native API testing platform used by 30,000+ engineers across 6,000+ enterprises and high-growth technology ...

HealthcareInfoSecurity

Qodo Targets AI Code Risks, Quality With $70M Series B Raise

As AI-generated code surges, New York-based startup Qodo has raised $70 million in Series B funding to address governance and ...

Morning Overview on MSN

Studies find AI-generated code can outperform humans in biomedical analysis

Researchers at UC San Francisco and Wayne State University prompted generative-AI chatbots to write analysis code for ...

14d

Leaked DeepSeek V4 Benchmarks Reveal a Massive 1-Million Token Context Window

Leaked DeepSeek V4 benchmarks claim a 1M token context and multimodal support, but sources remain unverified and ...

Vibe check from inside one of AI industry's main events: 'Claude mania'

Claude Code launched to the general public in May 2025, and as of February was generating more than $2.5 billion in ...

MIT Technology Review

Want to understand the current state of AI? Check out these charts.

If you’re following AI news, you’re probably getting whiplash. AI is a gold rush. AI is a bubble. AI is taking your job. AI ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results