VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework

被引:1
|
作者
Yao, Zhi [1 ,2 ]
Tang, Zhiqing [1 ]
Lou, Jiong [3 ]
Shen, Ping [1 ]
Jia, Weijia [1 ,4 ]
机构
[1] Beijing Normal Univ, Inst Artificial Intelligence & Future Networks, Beijing 519087, Peoples R China
[2] Beijing Normal Univ, Sch Artificial Intelligence, Beijing 100875, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[4] BNU HKBU United Int Coll, Guangdong Key Lab & Multi Modal Data Proc, Zhuhai 519087, Peoples R China
基金
中国国家自然科学基金;
关键词
Edge Computing; Quality of Services; Vector Database; Multi-Agent Reinforcement Learning; Large Language Model; Request Scheduling;
D O I
10.1109/ICWS62655.2024.00105
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substantially mitigate response delays and cost associated with similar requests, which has been overlooked by previous research. Addressing these gaps, this paper introduces a novel Vector database-assisted cloud-Edge collaborative LLM QoS Optimization (VELO) framework. Firstly, we propose the VELO framework, which ingeniously employs vector database to cache the results of some LLM requests at the edge to reduce the response time of subsequent similar requests. Diverging from direct optimization of the LLM, our VELO framework does not necessitate altering the internal structure of LLM and is broadly applicable to diverse LLMs. Subsequently, building upon the VELO framework, we formulate the QoS optimization problem as a Markov Decision Process (MDP) and devise an algorithm grounded in Multi-Agent Reinforcement Learning (MARL) to decide whether to request the LLM in the cloud or directly return the results from the vector database at the edge. Moreover, to enhance request feature extraction and expedite training, we refine the policy network of MARL and integrate expert demonstrations. Finally, we implement the proposed algorithm within a real edge system. Experimental findings confirm that our VELO framework substantially enhances user satisfaction by concurrently diminishing delay and resource consumption for edge users utilizing LLMs.
引用
收藏
页码:865 / 876
页数:12
相关论文
共 48 条
  • [1] A collaborative cloud-edge computing framework in distributed neural network
    Xu, Shihao
    Zhang, Zhenjiang
    Kadoch, Michel
    Cheriet, Mohamed
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2020, 2020 (01)
  • [2] A collaborative cloud-edge computing framework in distributed neural network
    Shihao Xu
    Zhenjiang Zhang
    Michel Kadoch
    Mohamed Cheriet
    EURASIP Journal on Wireless Communications and Networking, 2020
  • [3] Cloud-Edge Collaborative Optimization Based on Distributed UAV Network
    Yang, Jian
    Tao, Jinyu
    Wang, Cheng
    Yang, Qinghai
    ELECTRONICS, 2024, 13 (18)
  • [4] DHP: CLOUD-EDGE COLLABORATIVE INTERNET FRAMEWORK FOR NUCLEAR POWER INDUSTRY
    Cheng, Minmin
    Jing, Yingang
    Xu, Kui
    Liu, Xianying
    PROCEEDINGS OF 2024 31ST INTERNATIONAL CONFERENCE ON NUCLEAR ENGINEERING, VOL 1, ICONE31 2024, 2024,
  • [5] A Cloud-Edge Collaborative Framework for Adaptive Quality Prediction Modeling in IIoT
    Yuan, Xiaofeng
    Wang, Yichen
    Wang, Kai
    Ye, Lingjian
    Shen, Feifan
    Wang, Yalin
    Yang, Chunhua
    Gui, Weihua
    IEEE SENSORS JOURNAL, 2024, 24 (20) : 33656 - 33668
  • [6] Dynamic Load Combined Prediction Framework with Collaborative Cloud-Edge for Microgrid
    Hou, Wenjing
    Wen, Hong
    Zhang, Ning
    Lei, Wenxin
    Lin, Haojie
    IEEE INFOCOM 2022 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2022,
  • [7] Smart electronic gastroscope system using a cloud-edge collaborative framework
    Ding, Shuai
    Li, Ling
    Li, Zhenmin
    Wang, Hao
    Zhang, Yanchun
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 100 : 395 - 407
  • [8] QoS-Aware Cloud-Edge Collaborative Micro-Service Scheduling in the IIoT
    Peng, Kai
    Zhao, Bohai
    Bilal, Muhammad
    Xu, Xiaolong
    Nayyar, Anand
    HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2023, 13
  • [9] Priority-Based Offloading Optimization in Cloud-Edge Collaborative Computing
    He, Zhenli
    Xu, Yanan
    Zhao, Mingxiong
    Zhou, Wei
    Li, Keqin
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (06) : 3906 - 3919
  • [10] Task partitioning and offloading in IoT cloud-edge collaborative computing framework: a survey
    Chen, Haiming
    Qin, Wei
    Wang, Lei
    Journal of Cloud Computing, 2022, 11 (01)