共 46 条
[41]
Vaswani A, 2017, ADV NEUR IN, V30
[42]
vLLM Team, 2023, Notes on vLLM v.s. DeepSpeed-FastGen
[43]
vLLM Team, 2024, vLLM: Easy, fast, and cheap LLM serving with PagedAttention
[44]
Yu GI, 2022, PROCEEDINGS OF THE 16TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2022, P521
[45]
Zhang SS, 2022, Arxiv, DOI [arXiv:2205.01068, 10.48550/ARXIV.2205.01068]
[46]
Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference
[J].
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA),
2016,
:456-468