Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach

被引:3
|
作者
He, Ying [1 ]
Fang, Jingcheng [1 ]
Yu, F. Richard [1 ,2 ]
Leung, Victor C. [3 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[2] Carleton Univ, Sch Informat Technol, Ottawa, ON K1S 5B6, Canada
[3] Univ British Columbia, Dept Elect Comp Engn, Vancouver V6T 1Z4, BC, Canada
基金
中国国家自然科学基金;
关键词
Task analysis; Computational modeling; Cloud computing; Resource management; Edge computing; Artificial neural networks; Predictive models; Active inference; cloud-edge computing; large language model; reinforcement learning; resource allocation; task offloading;
D O I
10.1109/TMC.2024.3415661
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the increasing popularity and demands for large language model applications on mobile devices, it is difficult for resource-limited mobile terminals to run large-model inference tasks efficiently. Traditional deep reinforcement learning (DRL) based approaches have been used to offload large language models (LLMs) inference tasks to servers. However, existing DRL solutions suffer from data inefficiency, insensitivity to latency requirements, and non-adaptability to task load variations, which will degrade the performance of LLMs. In this paper, we propose a novel approach based on active inference for LLMs inference task offloading and resource allocation in cloud-edge computing. Extensive simulation results show that our proposed method has superior performance over mainstream DRLs, improves in data utilization efficiency, and is more adaptable to changing task load scenarios.
引用
收藏
页码:11253 / 11264
页数:12
相关论文
共 50 条
  • [31] Cost Minimization-Oriented Computation Offloading and Service Caching in Mobile Cloud-Edge Computing: An A3C-Based Approach
    Zhou, Huan
    Wang, Zhenning
    Zheng, Hantong
    He, Shibo
    Dong, Mianxiong
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2023, 10 (03): : 1326 - 1338
  • [32] Game Theory-Based Task Offloading and Resource Allocation for Vehicular Networks in Edge-Cloud Computing
    Jiang, Qinting
    Xu, Xiaolong
    He, Qiang
    Zhang, Xuyun
    Dai, Fei
    Qi, Lianyong
    Dou, Wanchun
    2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, : 341 - 346
  • [33] A Task Offloading and Resource Allocation Optimization Method in End-Edge-Cloud Orchestrated Computing
    Peng, Bo
    Peng, Shi Lin
    Li, Qiang
    Chen, Cheng
    Zhou, Yu Zhu
    Lei, Xiang
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT VI, 2024, 14492 : 299 - 310
  • [34] Joint Optimization of Service Caching Task Offloading and Resource Allocation in Cloud-Edge Cooperative Network
    Tang, Chaogang
    Ding, Yao
    Xiao, Shuo
    Wu, Huaming
    Li, Ruidong
    ICC 2024 - IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2024, : 4036 - 4041
  • [35] Optimizing task offloading and resource allocation in edge-cloud networks: a DRL approach
    Ullah, Ihsan
    Lim, Hyun-Kyo
    Seok, Yeong-Jun
    Han, Youn-Hee
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2023, 12 (01):
  • [36] Optimizing task offloading and resource allocation in edge-cloud networks: a DRL approach
    Ihsan Ullah
    Hyun-Kyo Lim
    Yeong-Jun Seok
    Youn-Hee Han
    Journal of Cloud Computing, 12
  • [37] Revenue and Energy Efficiency-Driven Delay-Constrained Computing Task Offloading and Resource Allocation in a Vehicular Edge Computing Network: A Deep Reinforcement Learning Approach
    Huang, Xinyu
    He, Lijun
    Chen, Xing
    Wang, Liejun
    Li, Fan
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (11) : 8852 - 8868
  • [38] Efficient task migration and resource allocation in cloud-edge collaboration: A DRL approach with learnable masking
    Wang, Yang
    Chen, Juan
    Wu, Zongling
    Chen, Peng
    Li, Xi
    Hao, Junfeng
    ALEXANDRIA ENGINEERING JOURNAL, 2025, 111 : 107 - 122
  • [39] A Novel Resource Allocation for Anti-Jamming in Cognitive-UAVs: An Active Inference Approach
    Krayani, Ali
    Alam, Atm S.
    Marcenaro, Lucio
    Nallanathan, Arumugam
    Regazzoni, Carlo
    IEEE COMMUNICATIONS LETTERS, 2022, 26 (10) : 2272 - 2276
  • [40] Energy-Efficient Task Offloading and Resource Allocation for Delay-Constrained Edge-Cloud Computing Networks
    Wang, Sai
    Li, Xiaoyang
    Gong, Yi
    IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, 2024, 8 (01): : 514 - 524