Researchers from Intel Labs and the Weizmann Institute of Science have introduced a major advance in speculative decoding. The new technique, presented at the International Conference on Machine ...
Researchers from the University of Maryland, Lawrence Livermore, Columbia and TogetherAI have developed a training technique that triples LLM inference speed without auxiliary models or infrastructure ...