Benchmark LLM Models - Search News

Google’s new AI model doubles reasoning performance

Google Unveils Gemini 3.1 Pro for Advanced Reasoning

Google has announced Gemini 3.1 Pro, an upgraded version of its flagship large language model designed specifically for complex reasoning,

· 23h · on MSN

Google's Gemini 3.1 Pro is here, and it just doubled its reasoning score

· 15h · on MSN

Gemini 3.1 Pro: Google’s new AI model doubles reasoning performance

· 19h

Google launches Gemini 3.1 Pro with enhanced reasoning for complex tasks

Google has announced the rollout of Gemini 3.1 Pro, its latest artificial intelligence model aimed at tackling complex problem-solving tasks with improved reasoning capabilities.

· 22h

Google rolls out Gemini 3.1 Pro AI model for complex tasks: Details

· 22h

Google releases Gemini 3.1 Pro: What is it and how is it better

newsbytesapp.com · 1d

New Gemini 3.1 Pro is Google's most advanced reasoning model

The Gemini 3.1 Pro model has achieved an ARC-AGI-2 score of 77.1%, which is more than double the reasoning performance of its predecessor, the Gemini 3 Pro.

· 1d

Google launches Gemini 3.1 Pro with advanced reasoning abilities: Here's how to start using

CNET · 1d

Google Rolls Out Latest AI Model, Gemini 3.1 Pro

SiliconANGLE

MLCommons releases new AILuminate benchmark for measuring AI model safety

MLCommons today released AILuminate, a new benchmark test for evaluating the safety of large language models. Launched in 2020, MLCommons is an industry consortium backed by several dozen tech firms. It primarily develops benchmarks for measuring the speed ...

Hosted on MSN

AI benchmarks are a bad joke – and LLM makers are the ones laughing

AI companies regularly tout their models' performance on benchmark tests as a sign of technological and intellectual superiority. But those results, widely used in marketing, may not be meaningful.… A study [PDF] from researchers at the Oxford Internet ...

VentureBeat

Google DeepMind researchers introduce new benchmark to improve LLM factuality, reduce hallucinations

Hallucinations, or factually inaccurate responses, continue to plague large language models (LLMs). Models falter particularly when they are given more complex tasks and when users are looking for specific and highly detailed responses. It’s a ...

Automat-it Launches LLM Selection Optimizer to Slash Startup LLM Costs by up to 60%

AWS Premier Tier Partner leverages its AI Services Competency and expertise to help founders cut LLM costs using

Security

Simbian launches new security benchmark with AI SOC LLM Leaderboard

Simbian today announced the “AI SOC LLM Leaderboard,” a comprehensive benchmark to measure LLM performance in Security Operations Centers (SOCs). The new benchmark compares LLMs across a diverse range of attacks and SOC tools in a realistic IT ...

SiliconANGLE

Cerebras Systems upgrades its inference service with record performance for Meta’s largest LLM model

Cerebras Systems Inc., an ambitious artificial intelligence computing startup and rival chipmaker to Nvidia Corp., said today that its cloud-based AI large language model inference service can run Meta Platforms Inc.’s largest model at almost 1,000 ...

Taalas Launches Hardcore Chip With ‘Insane’ AI Inference Performance

Taalas has launched an AI accelerator that puts the entire AI model into silicon, delivering 1-2 orders of magnitude greater performance. Seriously.

Virtualization Review

AI's Heavy Hitters: Best Models for Every Task

In today's crowded AI landscape, organizations looking to leverage AI models are faced with an overwhelming number of options. But how to choose? An obvious starting point are all the various AI leaderboards that have sprung up. However, while AI ...

Sarvam AI unveils indigenously-built 30B and 105B LLM models

Sarvam AI launches two advanced LLM models, 30B and 105B, outperforming competitors in key benchmarks, focusing on Indian language support.