Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things

被引：1

作者：

Yuan, Xiaoming ^{[1
,2
]}

Kong, Weixuan ^{[1
]}

Luo, Zhenyu ^{[1
]}

Xu, Minrui ^{[3
]}

机构：

[1] Northeastern Univ Qinhuangdao, Hebei Key Lab Marine Percept Network & Data Proc, Qinhuangdao 066004, Peoples R China

[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China

[3] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore

来源：

ELECTRONICS | 2024年 / 13卷 / 11期

基金：

中国国家自然科学基金;

关键词：

large language models; efficient inference offloading; mixture-of-experts; Internet of Medical Things;

D O I：

10.3390/electronics13112077

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Despite recent significant advancements in large language models (LLMs) for medical services, the deployment difficulties of LLMs in e-healthcare hinder complex medical applications in the Internet of Medical Things (IoMT). People are increasingly concerned about e-healthcare risks and privacy protection. Existing LLMs face difficulties in providing accurate medical questions and answers (Q&As) and meeting the deployment resource demands in the IoMT. To address these challenges, we propose MedMixtral 8x7B, a new medical LLM based on the mixture-of-experts (MoE) architecture with an offloading strategy, enabling deployment on the IoMT, improving the privacy protection for users. Additionally, we find that the significant factors affecting latency include the method of device interconnection, the location of offloading servers, and the speed of the disk.

引用

页数：17

共 50 条

[21] Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation
Chen, Qian
Zhu, Lei
He, Hangzhou
Zhang, Xinliang
Zeng, Shuang
Ren, Qiushi
Lu, Yanye
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VIII, 2024, 15008 : 382 - 392
[22] Adaptive mixture-of-experts models for data glove interface with multiple users
Yoon, Jong-Won
Yang, Sung-Ihk
Cho, Sung-Bae
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (05) : 4898 - 4907
[23] Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models
Liu, Juncai
Wang, Jessie Hui
Jiang, Yimin
PROCEEDINGS OF THE 2023 ACM SIGCOMM 2023 CONFERENCE, SIGCOMM 2023, 2023, : 486 - 498
[24] Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
Yao, Jinghan
Anthony, Quentin
Shafi, Aamir
Subramoni, Hari
Panda, Dhabaleswar K.
PROCEEDINGS 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS 2024, 2024, : 915 - 925
[25] AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation
Jawahar, Ganesh
Mukherjee, Subhabrata
Li, Xiaodong
Kim, Young Jin
Abdul-Mageed, Muhammad
Lakshmanan, Laks V. S.
Awadallah, Ahmed Hassan
Bubeck, Sebastien
Gao, Jianfeng
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9116 - 9132
[26] Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Ryabinin, Max
Gusev, Anton
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[27] Efficient scaling of large language models with mixture of experts and 3D analog in-memory computing
Buchel, Julian
Vasilopoulos, Athanasios
Simon, William Andrew
Boybat, Irem
Tsai, Hsinyu
Burr, Geoffrey W.
Castro, Hernan
Filipiak, Bill
Le Gallo, Manuel
Rahimi, Abbas
Narayanan, Vijay
Sebastian, Abu
NATURE COMPUTATIONAL SCIENCE, 2025, 5 (01): : 13 - 26
[28] Bayesian shrinkage in mixture-of-experts models: identifying robust determinants of class membership
Gregor Zens
Advances in Data Analysis and Classification, 2019, 13 : 1019 - 1051
[29] Bayesian shrinkage in mixture-of-experts models: identifying robust determinants of class membership
Zens, Gregor
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2019, 13 (04) : 1019 - 1051
[30] Improving risk classification and ratemaking using mixture-of-experts models with random effects
Tseung, Spark C.
Chan, Ian Weng
Fung, Tsz Chai
Badescu, Andrei L.
Lin, X. Sheldon
JOURNAL OF RISK AND INSURANCE, 2023, 90 (03) : 789 - 820

← 1 2 3 4 5 →