SANTA CLARA, Calif., March 21, 2023 (GLOBE NEWSWIRE) -- GTC -- NVIDIA today launched four inference platforms optimized for a diverse set of rapidly emerging generative AI applications — helping ...
Flaws replicated from Meta’s Llama Stack to Nvidia TensorRT-LLM, vLLM, SGLang, and others, exposing enterprise AI stacks to systemic risk. Cybersecurity researchers have uncovered a chain of critical ...
Nvidia has set new MLPerf performance benchmarking records on its H200 Tensor Core GPU and TensorRT-LLM software. MLPerf Inference is a benchmarking suite that measures inference performance across ...
At its GTC conference, Nvidia today announced Nvidia NIM, a new software platform designed to streamline the deployment of custom and pre-trained AI models into production environments. NIM takes the ...
A chain of critical vulnerabilities in NVIDIA's Triton Inference Server has been discovered by researchers, just two weeks after a Container Toolkit vulnerability was identified. The Triton Inference ...
Nvidia Corp. said today it’s adding a new, microservices-based layer to its popular Nvidia AI Enterprise platform, giving generative artificial intelligence model developers, platform providers and ...
Apple and NVIDIA shared details of a collaboration to improve the performance of LLMs with a new text generation technique for AI. Cupertino writes: Accelerating LLM inference is an important ML ...
NVIDIA said it has achieved a record large language model (LLM) inference speed, announcing that an NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs achieved more than 1,000 tokens per second ...