Ai Benchmarks for Code

First Benchmark for Legacy Code Comprehension Shows Specialized AI Approach Outperforms General-PurposeModels

LegacyCodeBench tests whether AI can understand COBOL well enough to document itaccurately not just generate plausible ...

Communications of the ACMOpinion

When AI Tools Train on AI Output: Model Collapse in Daily Workflows

The degradation is subtle but cumulative. Tools that release frequent updates while training on datasets polluted with ...

Qwen3-Coder-Next offers vibe coders a powerful open source, ultra-sparse model with 10x higher throughput for repo tasks

On SWE-Bench Verified, the model achieved a score of 70.6%. This performance is notably competitive when placed alongside ...

Kimi K2.5 Makes Agent Work 4.5x Faster : Matching Top Models in Vision & Code

Kimi K2.5 handles up to 100 sub-agents and 1,500 tool calls, cutting task time 4.5x so you finish complex work sooner.

8dOpinion

Al Benchmarks Investigated : Do Companies Tune Private Builds for Leaderboards, Then Ship Weaker Versions?

AI model testing is being gamed and AI leaderboard rankings can be tricked. An Oxford review found issues in nearly half of ...

State of Testing 2026: Senior Testers Face $20K ‘Specialist Penalty’ for Prioritizing Code Over Strategy

The 13th annual report reveals a 24% income gap between strategic leaders and ICs, while new data shows hands-on AI ...

VentureBeat

Has this stealth startup finally cracked the code on enterprise AI agent reliability? Meet AUI's Apollo-1

For more than a decade, conversational AI has promised human-like assistants that can do more than chat. Yet even as large language models (LLMs) like ChatGPT, Gemini, and Claude learn to reason, ...

Frontiers

AI Era-Informed Innovative Quantitative Research Methods for Social, Behavioral and Cognitive Sciences

The AI revolution has transformed behavioral and cognitive research through unprecedented data volume, velocity, and variety (e.g., neural imaging, ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results