Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach

被引:3
|
作者
He, Ying [1 ]
Fang, Jingcheng [1 ]
Yu, F. Richard [1 ,2 ]
Leung, Victor C. [3 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[2] Carleton Univ, Sch Informat Technol, Ottawa, ON K1S 5B6, Canada
[3] Univ British Columbia, Dept Elect Comp Engn, Vancouver V6T 1Z4, BC, Canada
基金
中国国家自然科学基金;
关键词
Task analysis; Computational modeling; Cloud computing; Resource management; Edge computing; Artificial neural networks; Predictive models; Active inference; cloud-edge computing; large language model; reinforcement learning; resource allocation; task offloading;
D O I
10.1109/TMC.2024.3415661
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the increasing popularity and demands for large language model applications on mobile devices, it is difficult for resource-limited mobile terminals to run large-model inference tasks efficiently. Traditional deep reinforcement learning (DRL) based approaches have been used to offload large language models (LLMs) inference tasks to servers. However, existing DRL solutions suffer from data inefficiency, insensitivity to latency requirements, and non-adaptability to task load variations, which will degrade the performance of LLMs. In this paper, we propose a novel approach based on active inference for LLMs inference task offloading and resource allocation in cloud-edge computing. Extensive simulation results show that our proposed method has superior performance over mainstream DRLs, improves in data utilization efficiency, and is more adaptable to changing task load scenarios.
引用
收藏
页码:11253 / 11264
页数:12
相关论文
共 50 条
  • [21] Adaptive Data Sharing and Computation Offloading in Cloud-Edge Computing with Resource Constraints
    Chu, Wenjie
    Zhao, Haiyan
    Jin, Zhi
    Hu, Zhenjiang
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 2842 - 2849
  • [22] Toward Mobility-Aware Computation Offloading and Resource Allocation in End-Edge-Cloud Orchestrated Computing
    Dai, Bin
    Niu, Jianwei
    Ren, Tao
    Atiquzzaman, Mohammed
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (19) : 19450 - 19462
  • [23] Time-Slotted Task Offloading and Resource Allocation for Cloud-Edge-End Cooperative Computing Networks
    Fan, Wenhao
    Liu, Xun
    Yuan, Hao
    Li, Nan
    Liu, Yuan'an
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (08) : 8225 - 8241
  • [24] Deep Reinforcement Learning Based Resource Allocation Strategy in Cloud-Edge Computing System
    Xu, Zhuohan
    Zhong, Zeheng
    Shi, Bing
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [25] A Multi-Objective Evolutionary Approach: Task Offloading and Resource Allocation Using Enhanced Decomposition-Based Algorithm in Mobile Edge Computing
    Yu, Chunyang
    Yong, Yibo
    Liu, Yang
    Cheng, Jian
    Tong, Qiang
    IEEE ACCESS, 2024, 12 : 123640 - 123655
  • [26] A hierarchical optimization approach for industrial task offloading and resource allocation in edge computing systems
    Dong, Jiadong
    Chen, Lin
    Zheng, Chunxiang
    Pan, Kai
    Guo, Qinghu
    Wu, Shunfeng
    Wang, Zhaoxiang
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (05): : 5953 - 5979
  • [27] Model-Based Comparison of Cloud-Edge Computing Resource Allocation Policies
    Jiang, Lili
    Chang, Xiaolin
    Yang, Runkai
    Misic, Jelena
    Misic, Vojislav B.
    COMPUTER JOURNAL, 2020, 63 (10) : 1564 - 1583
  • [28] HTR: A Joint Approach for Task Offloading and Resource Allocation in Mobile Edge Computing
    Wang, Zilong
    Du, Hongwei
    Ye, Qiang
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
  • [29] Model-based comparison of cloud-edge computing resource allocation policies
    Jiang L.
    Chang X.
    Yang R.
    Mišić J.
    Mišić V.B.
    Computer Journal, 2020, 63 (10): : 1564 - 1583
  • [30] A Cloud-Edge Collaborative Computing Task Scheduling and Resource Allocation Algorithm for Energy Internet Environment
    Song, Xin
    Wang, Yue
    Xie, Zhigang
    Xia, Lin
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (06): : 2282 - 2303