Video Coding Benchmarks

CodeClash Benchmarks LLMs through Multi-Round Coding Competitions

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources. This article introduces practical methods for ...

MiniMax M2.7 Self-Evolving AI Model Shows Gains in Coding Benchmarks

Anthropic Claude Co-work Dispatch runs approved desktop tasks from mobile messages, focused on local execution and data ...

Geeky Gadgets

Anthropic Claude Opus 4.5 Tops Coding Benchmarks While Slashing Token Use

What if the future of coding wasn’t human, but instead powered by an AI so advanced it could outpace even the most skilled developers? Enter Claude Opus 4.5, a model that doesn’t just assist with ...

Bleeping Computer

Grok 4 benchmark results: Tops math, ranks second in coding

Grok 4 is a huge leap from Grok 3, but how good is it compared to other models in the market, such as Gemini 2.5 Pro? We now have answers, thanks to new independent benchmarks. LMArena.ai, which is an ...

MIT Technology Review

How to build a better AI benchmark

To fix the way we test and measure models, AI is learning tricks from social science. It’s not easy being one of Silicon Valley’s favorite benchmarks. SWE-Bench (pronounced “swee bench”) launched in ...

VentureBeat

Microsoft’s GRIN-MoE AI model takes on coding and math, beating competitors in key benchmarks

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft has unveiled a groundbreaking artificial intelligence model, ...

eWeek

Gemini Beats Claude, GPT in Google’s First Android AI Coding Benchmark

eSpeaks’ Corey Noles talks with Rob Israch, President of Tipalti, about what it means to lead with Global-First Finance and how companies can build scalable, compliant operations in an increasingly ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results