Generative Inference of Large Language Models in Edge Computing: An Energy Efficient Approach

被引：1

作者：

Yuan, Xingyu ^{[1
]}

Li, He ^{[1
]}

Ota, Kaoru ^{[1
]}

Dong, Mianxiong ^{[1
]}

机构：

[1] Muroran Inst Technol, Dept Sci & Informat, Muroran, Hokkaido, Japan

来源：

20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024 | 2024年

关键词：

Large Language Models (LLMs); Energy Efficiency; Intelligent Edge Computing; RESOURCE-ALLOCATION;

D O I：

10.1109/IWCMC61514.2024.10592339

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Large Language Models (LLMs) have demonstrated remarkable proficiency in generating text and producing fluent, succinct, and precise linguistic expressions. Limited battery life and computing power make it challenging to process LLM inference tasks in mobile devices. Intelligent edge computing brings the opportunity to help users process LLM inference tasks in real-time by offloading computations to nearby edge devices. However, due to the undetermined relationship between various task requirements and offloading configurations, inefficient offloading leads to unaffordable additional energy consumption, especially for intelligent tasks. This paper first investigates the energy consumption issue with different offloading configurations and task requirements in an intelligent edge testbed. According to the preliminary experiment results, we formulate the LLM offloading problem as a multi-armed bandit (MAB) problem and then use an upper confidence bound (UCB) bandit algorithm to find the energy-efficient offloading configurations. Extensive simulation results show that our approach enhanced the energy efficiency for offloading LLM inference tasks with different requirements in the intelligent edge environment.

引用

页码：244 / 249

页数：6

共 14 条

[1] Energy-efficient edge based real-time healthcare support system
Abirami, S.
Chitra, P.
[J]. DIGITAL TWIN PARADIGM FOR SMARTER SYSTEMS AND ENVIRONMENTS: THE INDUSTRY USE CASES, 2020, 117 : 339 - 368
[2] Bolourian M., 2023, IEEE Internet of Things Journal
[3] Towards a lightweight task scheduling framework for cloud and edge platform
Dreibholz, Thomas
Mazumdar, Somnath
[J]. INTERNET OF THINGS, 2023, 21
[4] Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Networks: An Active Inference Approach
Fang, Jingcheng
He, Ying
Yu, F. Richard
Li, Jianqiang
Leung, Victor C.
[J]. 2023 IEEE 98TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-FALL, 2023,
[5] AutoML for Video Analytics with Edge Computing
Galanopoulos, Apostolos
Ayala-Romero, Jose A.
Leith, Douglas J.
Iosifidis, George
[J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,
[6] REED: Enhanced Resource Allocation and Energy Management in SDN-Enabled Edge Computing-Based Smart Buildings
Ibrar, Muhammad
Erbad, Aiman
Abegaz, Mohammed
Akbar, Aamir
Houchati, Mahdi
Corchado, Juan M.
[J]. 2023 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING, IWCMC, 2023, : 860 - 865
[7] The great transformer: Examining the role of large language models in the political economy of AI
Luitse, Dieuwertje
Denkena, Wiebke
[J]. BIG DATA & SOCIETY, 2021, 8 (02):
[8] HDFRL-empowered Energy Efficient Resource Allocation for Aerial MEC-enabled Smart City Cyber Physical System in 6G
Seid, Abegaz Mohammed
Abishu, Hayla Nahom
Erbad, Aiman
Guizani, Mohsen
[J]. 2023 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING, IWCMC, 2023, : 836 - 841
[9] Su N., 2023, IEEE Transactions on Green Communications and Networking
[10] Touvron H, 2023, Arxiv, DOI [arXiv:2302.13971, DOI 10.48550/ARXIV.2302.13971]

← 1 2 →