共 179 条
[61]
Kitaev N., 2020, Proc. the 8th International Conference on Learning Representations
[62]
Korthikanti V A., 2023, Proc. the 6th Conference on Machine Learning and Systems
[63]
Kusumoto M., 2019, Proc. the 33rd International Conference on Neural Information Processing Systems
[64]
Efficient Memory Management for Large Language Model Serving with PagedAttention
[J].
PROCEEDINGS OF THE TWENTY-NINTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, SOSP 2023,
2023,
:611-626
[66]
Lepikhin D, 2020, Arxiv, DOI arXiv:2006.16668
[67]
Lewis M., 2020, P 58 ANN M ASS COMP, P7871, DOI 10.18653/v1/2020.acl-main.703
[69]
Li CL, 2024, AAAI CONF ARTIF INTE, P18490