Multimodal Models - Search News

Beyond Large Language Models: How Multimodal AI Is Unlocking Human-Like Intelligence

The AI industry has long been dominated by text-based large language models (LLMs), but the future lies beyond the written word. Multimodal AI represents the next major wave in artificial intelligence ...

British Journal of Ophthalmology

Publicly available multimodal large language models for ocular surface infections: benchmarking against corneal specialists in triage, diagnosis and treatment

Background/aims Ocular surface infections remain a major cause of visual loss worldwide, yet diagnosis often relies on slow ...

LG Reveals Next-Gen Multimodal AI 'EXAONE 4.5'

LG AI Research today announced the release of EXAONE 4.5, its latest multimodal AI model capable of simultaneously understanding and reasoning across both text and images.

TechCrunch

Meet two open source challengers to OpenAI’s ‘multimodal’ GPT-4V

OpenAI’s GPT-4V is being hailed as the next big thing in AI: a “multimodal” model that can understand both text and images. This has obvious utility, which is why a pair of open source projects have ...

Meta - About Facebook

Introducing Muse Spark: MSL’s First Model, Purpose-Built to Prioritize People

Muse Spark powers a smarter and faster Meta AI assistant, and will be rolling out to WhatsApp, Instagram, Facebook, Messenger ...

9to5Mac

New Apple model combines vision understanding and image generation with impressive results

In the study titled MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer, a team of nearly 30 Apple researchers details a novel unified approach that enables both ...

Semiconductor Engineering

NPU Acceleration For Multimodal LLMs

Transformer-based models have rapidly spread from text to speech, vision, and other modalities. This has created challenges for the development of Neural Processing Units (NPUs). NPUs must now ...

NewsX

Meta Introduces Muse Spark: Multimodal AI Model For Advanced Reasoning—Know How It Is Better Than ChatGPT And Gemini

Meta has launched Muse Spark, a new multimodal AI model aimed at building personal superintelligence. It supports advanced reasoning, multi-agent workflows, and shows strong benchmark performance ...

i-SCOOP

GLM-5V-Turbo: Z.ai’s native multimodal agent model explained

GLM-5V-Turbo is Z.ai's first native multimodal agent foundation model, built for vision-based coding and agentic task ...

11h

Frontier models are failing one in three production attempts — and getting harder to audit

Stanford's 2026 AI Index: frontier models fail one in three attempts, lab transparency is declining, and benchmarks are ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results