Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach

被引:3
|
作者
He, Ying [1 ]
Fang, Jingcheng [1 ]
Yu, F. Richard [1 ,2 ]
Leung, Victor C. [3 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[2] Carleton Univ, Sch Informat Technol, Ottawa, ON K1S 5B6, Canada
[3] Univ British Columbia, Dept Elect Comp Engn, Vancouver V6T 1Z4, BC, Canada
基金
中国国家自然科学基金;
关键词
Task analysis; Computational modeling; Cloud computing; Resource management; Edge computing; Artificial neural networks; Predictive models; Active inference; cloud-edge computing; large language model; reinforcement learning; resource allocation; task offloading;
D O I
10.1109/TMC.2024.3415661
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the increasing popularity and demands for large language model applications on mobile devices, it is difficult for resource-limited mobile terminals to run large-model inference tasks efficiently. Traditional deep reinforcement learning (DRL) based approaches have been used to offload large language models (LLMs) inference tasks to servers. However, existing DRL solutions suffer from data inefficiency, insensitivity to latency requirements, and non-adaptability to task load variations, which will degrade the performance of LLMs. In this paper, we propose a novel approach based on active inference for LLMs inference task offloading and resource allocation in cloud-edge computing. Extensive simulation results show that our proposed method has superior performance over mainstream DRLs, improves in data utilization efficiency, and is more adaptable to changing task load scenarios.
引用
收藏
页码:11253 / 11264
页数:12
相关论文
共 50 条
  • [41] SDN-Based Resource Allocation in Edge and Cloud Computing Systems: An Evolutionary Stackelberg Differential Game Approach
    Du, Jun
    Jiang, Chunxiao
    Benslimane, Abderrahim
    Guo, Song
    Ren, Yong
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2022, 30 (04) : 1613 - 1628
  • [42] QoS-Aware Augmented Reality Task Offloading and Resource Allocation in Cloud-Edge Collaboration Environment
    Hao, Jia
    Chen, Yang
    Gan, Jianhou
    JOURNAL OF NETWORK AND SYSTEMS MANAGEMENT, 2025, 33 (01)
  • [43] A Quantum Reinforcement Learning Approach for Joint Resource Allocation and Task Offloading in Mobile Edge Computing
    Wei, Xinliang
    Gao, Xitong
    Ye, Kejiang
    Xu, Cheng-Zhong
    Wang, Yu
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (04) : 2580 - 2593
  • [44] A Bilevel Optimization Approach for Joint Offloading Decision and Resource Allocation in Cooperative Mobile Edge Computing
    Huang, Pei-Qiu
    Wang, Yong
    Wang, Kezhi
    Liu, Zhi-Zhong
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (10) : 4228 - 4241
  • [45] Online Optimal Service Selection, Resource Allocation and Task Offloading for Multi-Access Edge Computing: A Utility-Based Approach
    Chu, Weibo
    Yu, Peijie
    Yu, Zhiwen
    Lui, John C. S.
    Lin, Yi
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (07) : 4150 - 4167
  • [46] Joint Power Control and Resource Allocation With Task Offloading for Collaborative Device-Edge-Cloud Computing Systems
    Xie, Shumin
    Li, Kangshun
    Wang, Wenxiang
    Wang, Hui
    Jalil, Hassan
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2024, 2024
  • [47] Joint Offloading Decision and Resource Allocation for Vehicular Fog-Edge Computing Networks: A Contract-Stackelberg Approach
    Li, Yuwei
    Yang, Bo
    Wu, Hao
    Han, Qiaoni
    Chen, Cailian
    Guan, Xinping
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (17) : 15969 - 15982
  • [48] Joint Optimization of Task Offloading and Resource Allocation for UAV-Assisted Edge Computing: A Stackelberg Bilayer Game Approach
    Wang, Peng
    Chen, Guifen
    Sun, Zhiyao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (09) : 1174 - 1181
  • [49] Delay-aware resource allocation for partial computation offloading in mobile edge cloud computing
    Yu, Lingfei
    Xu, Hongliu
    Zeng, Yunhao
    Deng, Jiali
    PERVASIVE AND MOBILE COMPUTING, 2024, 105
  • [50] Joint Computation Offloading and Resource Allocation in Mobile-Edge Cloud Computing: A Two-Layer Game Approach
    He, Zhenli
    Guo, Ying
    Zhai, Xiaolong
    Zhao, Mingxiong
    Zhou, Wei
    Li, Keqin
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2025, 13 (01) : 411 - 428