Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach

被引:3
|
作者
He, Ying [1 ]
Fang, Jingcheng [1 ]
Yu, F. Richard [1 ,2 ]
Leung, Victor C. [3 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[2] Carleton Univ, Sch Informat Technol, Ottawa, ON K1S 5B6, Canada
[3] Univ British Columbia, Dept Elect Comp Engn, Vancouver V6T 1Z4, BC, Canada
基金
中国国家自然科学基金;
关键词
Task analysis; Computational modeling; Cloud computing; Resource management; Edge computing; Artificial neural networks; Predictive models; Active inference; cloud-edge computing; large language model; reinforcement learning; resource allocation; task offloading;
D O I
10.1109/TMC.2024.3415661
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the increasing popularity and demands for large language model applications on mobile devices, it is difficult for resource-limited mobile terminals to run large-model inference tasks efficiently. Traditional deep reinforcement learning (DRL) based approaches have been used to offload large language models (LLMs) inference tasks to servers. However, existing DRL solutions suffer from data inefficiency, insensitivity to latency requirements, and non-adaptability to task load variations, which will degrade the performance of LLMs. In this paper, we propose a novel approach based on active inference for LLMs inference task offloading and resource allocation in cloud-edge computing. Extensive simulation results show that our proposed method has superior performance over mainstream DRLs, improves in data utilization efficiency, and is more adaptable to changing task load scenarios.
引用
收藏
页码:11253 / 11264
页数:12
相关论文
共 50 条
  • [1] Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Networks: An Active Inference Approach
    Fang, Jingcheng
    He, Ying
    Yu, F. Richard
    Li, Jianqiang
    Leung, Victor C.
    2023 IEEE 98TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-FALL, 2023,
  • [2] An Adaptive Computing Offloading and Resource Allocation Strategy for Internet of Vehicles Based on Cloud-Edge Collaboration
    Shu, Wanneng
    Yu, Haoxin
    Zhai, Cao
    Feng, Xuanxuan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024,
  • [3] A Near-Optimal Approach for Online Task Offloading and Resource Allocation in Edge-Cloud Orchestrated Computing
    Liu, Tong
    Fang, Lu
    Zhu, Yanmin
    Tong, Weiqin
    Yang, Yuanyuan
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2022, 21 (08) : 2687 - 2700
  • [4] Dynamic Task Offloading and Resource Allocation for Mobile-Edge Computing in Dense Cloud RAN
    Zhang, Qi
    Gui, Lin
    Hou, Fen
    Chen, Jiacheng
    Zhu, Shichao
    Tian, Feng
    IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (04) : 3282 - 3299
  • [5] Learn to Coordinate for Computation Offloading and Resource Allocation in Edge Computing: A Rational-Based Distributed Approach
    Liu, Zhicheng
    Zhao, Yunfeng
    Song, Jinduo
    Qiu, Chao
    Chen, Xu
    Wang, Xiaofei
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (05): : 3136 - 3151
  • [6] Incentive-driven Computation Offloading and Resource Allocation in Mobile Cloud-Edge Computing
    Li, Mingze
    Wu, Tong
    Zhou, Huan
    Zhao, Liang
    Leung, Victor C. M.
    2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW), 2022, : 157 - 162
  • [7] Dynamic Resource Allocation for Cloud-Edge Collaboration Offloading in VEC Networks With Diverse Tasks
    Geng, Jingwei
    Qin, Zaiming
    Jin, Shunfu
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 21235 - 21251
  • [8] Generative Inference of Large Language Models in Edge Computing: An Energy Efficient Approach
    Yuan, Xingyu
    Li, He
    Ota, Kaoru
    Dong, Mianxiong
    20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024, 2024, : 244 - 249
  • [9] Reverse Auction-Based Computation Offloading and Resource Allocation in Mobile Cloud-Edge Computing
    Zhou, Huan
    Wu, Tong
    Chen, Xin
    He, Shibo
    Guo, Deke
    Wu, Jie
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (10) : 6144 - 6159
  • [10] Cloud/Edge Computing Resource Allocation and Pricing for Mobile Blockchain: An Iterative Greedy and Search Approach
    Fan, Yuqi
    Wang, Lunfei
    Wu, Weili
    Du, Dingzhu
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2021, 8 (02) : 451 - 463