VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework

被引:1
|
作者
Yao, Zhi [1 ,2 ]
Tang, Zhiqing [1 ]
Lou, Jiong [3 ]
Shen, Ping [1 ]
Jia, Weijia [1 ,4 ]
机构
[1] Beijing Normal Univ, Inst Artificial Intelligence & Future Networks, Beijing 519087, Peoples R China
[2] Beijing Normal Univ, Sch Artificial Intelligence, Beijing 100875, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[4] BNU HKBU United Int Coll, Guangdong Key Lab & Multi Modal Data Proc, Zhuhai 519087, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Edge Computing; Quality of Services; Vector Database; Multi-Agent Reinforcement Learning; Large Language Model; Request Scheduling;
D O I
10.1109/ICWS62655.2024.00105
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substantially mitigate response delays and cost associated with similar requests, which has been overlooked by previous research. Addressing these gaps, this paper introduces a novel Vector database-assisted cloud-Edge collaborative LLM QoS Optimization (VELO) framework. Firstly, we propose the VELO framework, which ingeniously employs vector database to cache the results of some LLM requests at the edge to reduce the response time of subsequent similar requests. Diverging from direct optimization of the LLM, our VELO framework does not necessitate altering the internal structure of LLM and is broadly applicable to diverse LLMs. Subsequently, building upon the VELO framework, we formulate the QoS optimization problem as a Markov Decision Process (MDP) and devise an algorithm grounded in Multi-Agent Reinforcement Learning (MARL) to decide whether to request the LLM in the cloud or directly return the results from the vector database at the edge. Moreover, to enhance request feature extraction and expedite training, we refine the policy network of MARL and integrate expert demonstrations. Finally, we implement the proposed algorithm within a real edge system. Experimental findings confirm that our VELO framework substantially enhances user satisfaction by concurrently diminishing delay and resource consumption for edge users utilizing LLMs.
引用
收藏
页码:865 / 876
页数:12
相关论文
共 48 条
  • [41] Large-scale hybrid task scheduling in cloud-edge collaborative manufacturing systems with FCRN-assisted random differential evolution
    Wang, Xiaohan
    Zhang, Lin
    Laili, Yuanjun
    Liu, Yongkui
    Li, Feng
    Chen, Zhen
    Zhao, Chun
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2024, 130 (1-2): : 253 - 266
  • [42] Large-scale hybrid task scheduling in cloud-edge collaborative manufacturing systems with FCRN-assisted random differential evolution
    Xiaohan Wang
    Lin Zhang
    Yuanjun Laili
    Yongkui Liu
    Feng Li
    Zhen Chen
    Chun Zhao
    The International Journal of Advanced Manufacturing Technology, 2024, 130 : 203 - 221
  • [43] Event-Triggered Fractional PID-Based Load Frequency Control in Islanded Microgrids Under Cloud-Edge Collaborative Framework
    Zheng, Min
    Chen, Ci
    Zhang, Yajian
    Ruan, Mengfan
    Li, Peike
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2024, 33 (09)
  • [44] Collaborative framework for UAVs-assisted mobile edge computing: a proximity policy optimization approach
    Ruizhong Du
    Bowen Cao
    Yan Gao
    The Journal of Supercomputing, 2024, 80 : 10485 - 10510
  • [45] Collaborative framework for UAVs-assisted mobile edge computing: a proximity policy optimization approach
    Du, Ruizhong
    Cao, Bowen
    Gao, Yan
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (08): : 10485 - 10510
  • [46] RETRACTED: Construction and optimization of inventory management system via cloud-edge collaborative computing in supply chain environment in the Internet of Things era (Retracted Article)
    Ran, Hailan
    PLOS ONE, 2021, 16 (11):
  • [47] Many-Objective Optimization-Based Content Popularity Prediction for Cache-Assisted Cloud-Edge-End Collaborative IoT Networks
    Hu, Zhaoming
    Fang, Chao
    Wang, Zhuwei
    Tseng, Shu-Ming
    Dong, Mianxiong
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (01): : 1190 - 1200