Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models.
Google launches Gemini 3.1 Pro with major gains in complex reasoning, multimodal capabilities, and benchmark-leading AI ...