Generative Inference of Large Language Models in Edge Computing: An Energy Efficient Approach

被引:1
作者
Yuan, Xingyu [1 ]
Li, He [1 ]
Ota, Kaoru [1 ]
Dong, Mianxiong [1 ]
机构
[1] Muroran Inst Technol, Dept Sci & Informat, Muroran, Hokkaido, Japan
来源
20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024 | 2024年
关键词
Large Language Models (LLMs); Energy Efficiency; Intelligent Edge Computing; RESOURCE-ALLOCATION;
D O I
10.1109/IWCMC61514.2024.10592339
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Large Language Models (LLMs) have demonstrated remarkable proficiency in generating text and producing fluent, succinct, and precise linguistic expressions. Limited battery life and computing power make it challenging to process LLM inference tasks in mobile devices. Intelligent edge computing brings the opportunity to help users process LLM inference tasks in real-time by offloading computations to nearby edge devices. However, due to the undetermined relationship between various task requirements and offloading configurations, inefficient offloading leads to unaffordable additional energy consumption, especially for intelligent tasks. This paper first investigates the energy consumption issue with different offloading configurations and task requirements in an intelligent edge testbed. According to the preliminary experiment results, we formulate the LLM offloading problem as a multi-armed bandit (MAB) problem and then use an upper confidence bound (UCB) bandit algorithm to find the energy-efficient offloading configurations. Extensive simulation results show that our approach enhanced the energy efficiency for offloading LLM inference tasks with different requirements in the intelligent edge environment.
引用
收藏
页码:244 / 249
页数:6
相关论文
共 14 条
  • [1] Energy-efficient edge based real-time healthcare support system
    Abirami, S.
    Chitra, P.
    [J]. DIGITAL TWIN PARADIGM FOR SMARTER SYSTEMS AND ENVIRONMENTS: THE INDUSTRY USE CASES, 2020, 117 : 339 - 368
  • [2] Bolourian M., 2023, IEEE Internet of Things Journal
  • [3] Towards a lightweight task scheduling framework for cloud and edge platform
    Dreibholz, Thomas
    Mazumdar, Somnath
    [J]. INTERNET OF THINGS, 2023, 21
  • [4] Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Networks: An Active Inference Approach
    Fang, Jingcheng
    He, Ying
    Yu, F. Richard
    Li, Jianqiang
    Leung, Victor C.
    [J]. 2023 IEEE 98TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-FALL, 2023,
  • [5] AutoML for Video Analytics with Edge Computing
    Galanopoulos, Apostolos
    Ayala-Romero, Jose A.
    Leith, Douglas J.
    Iosifidis, George
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,
  • [6] REED: Enhanced Resource Allocation and Energy Management in SDN-Enabled Edge Computing-Based Smart Buildings
    Ibrar, Muhammad
    Erbad, Aiman
    Abegaz, Mohammed
    Akbar, Aamir
    Houchati, Mahdi
    Corchado, Juan M.
    [J]. 2023 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING, IWCMC, 2023, : 860 - 865
  • [7] The great transformer: Examining the role of large language models in the political economy of AI
    Luitse, Dieuwertje
    Denkena, Wiebke
    [J]. BIG DATA & SOCIETY, 2021, 8 (02):
  • [8] HDFRL-empowered Energy Efficient Resource Allocation for Aerial MEC-enabled Smart City Cyber Physical System in 6G
    Seid, Abegaz Mohammed
    Abishu, Hayla Nahom
    Erbad, Aiman
    Guizani, Mohsen
    [J]. 2023 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING, IWCMC, 2023, : 836 - 841
  • [9] Su N., 2023, IEEE Transactions on Green Communications and Networking
  • [10] Touvron H, 2023, Arxiv, DOI [arXiv:2302.13971, DOI 10.48550/ARXIV.2302.13971]