Google says that its most advanced thinking model yet outperforms Claude and ChatGPT on Humanity's Last Exam and other key ...
Hugging Face has launched Community Evals, a feature that enables benchmark datasets on the Hub to host their own ...
According to Anthropic, "Claude Sonnet 4.6 is our most capable Sonnet model yet." The company says Sonnet 4.6 has a 1 million token context window in beta. Crucially, Anthropic reports that Sonnet 4.6 ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results