LLM Inference Optimization

New LLM optimization technique slashes memory costs up to 75%

Researchers at the Tokyo-based startup Sakana AI have developed a new technique that enables language models to use memory more efficiently, helping enterprises cut the costs of building applications ...

CRN

Nvidia Says New Software Will Double LLM Inference Speed On H100 GPU

The AI chip giant says the open-source software library, TensorRT-LLM, will double the H100’s performance for running inference on leading large language models when it comes out next month. Nvidia ...

Semiconductor Engineering

HW-SW Co-Designed System With 3 Core Optimization Pathways For Long-Context Agentic LLM ...

A new technical paper titled “Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference” was published by researchers at University of Cambridge, Imperial College London ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果

New LLM optimization technique slashes memory costs up to 75%

Nvidia Says New Software Will Double LLM Inference Speed On H100 GPU

HW-SW Co-Designed System With 3 Core Optimization Pathways For Long-Context Agentic LLM ...

今日热点