The Importance of Workload Choice in Evaluating LLM Inference Systems

被引:1
作者
Papaioannou, Konstantinos [1 ]
Doudali, Thaleia Dimitra [1 ]
机构
[1] Univ Politecn Madrid, IMDEA Software Inst, Madrid, Spain
来源
PROCEEDINGS OF THE 2024 4TH WORKSHOP ON MACHINE LEARNING AND SYSTEMS, EUROMLSYS 2024 | 2024年
关键词
Large Language Models; Inference; Machine Learning; KV Cache;
D O I
10.1145/3642970.3655823
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The success of Large Language Models (LLMs) across a wide range of applications and use cases has created the need for faster and more scalable systems for LLM inference. These systems speed up LLM inference by optimizing scheduling decisions or efficiently managing the available memory. However, most of them use synthetic datasets and target latencycritical scenarios in their evaluation, thereby overlooking a considerable part of real-world use cases and workloads. As a response, this paper presents an extensive experimental evaluation that aims to capture the impact of the workload used for evaluation and quantify the benefit derived from higher memory availability. Our analysis shows that LLMs can achieve 3x higher throughput for text generation and question-answering use cases compared to text summarization and conversational ones. The latter ones seem to exhibit low levels of performance due to their demanding input sizes. In addition, non-latency-critical inference services achieve 2.3x higher throughput when 4x more memory is available. In conclusion, this paper aims to highlight the importance and impact of the chosen workloads in the evaluation of systems for LLM inference.
引用
收藏
页码:39 / 46
页数:8
相关论文
共 46 条
[31]  
Sheng Ying, P MACHINE LEARNING R
[32]  
Stanford Center for Research on Foundation Models, 2023, HELM Classic Leaderboard
[33]  
Taori R, 2023, Stanford alpaca: An instruction-following llama model
[34]  
The Hugging Face team, 2016, Hugging Face Datasets
[35]  
The Hugging Face team, 2022, OPT-6.7B
[36]  
The Hugging Face team, 2023, Llama-2-13B
[37]  
The Hugging Face team, 2023, Llama-2-7B
[38]  
The Hugging Face team, 2022, OPT-13B
[39]  
Touvron H, 2023, Arxiv, DOI [arXiv:2307.09288, 10.48550/arXiv.2307.09288]
[40]  
Touvron H, 2023, Arxiv, DOI [arXiv:2302.13971, 10.48550/arXiv.2302.13971]