Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things

被引:1
|
作者
Yuan, Xiaoming [1 ,2 ]
Kong, Weixuan [1 ]
Luo, Zhenyu [1 ]
Xu, Minrui [3 ]
机构
[1] Northeastern Univ Qinhuangdao, Hebei Key Lab Marine Percept Network & Data Proc, Qinhuangdao 066004, Peoples R China
[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China
[3] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
基金
中国国家自然科学基金;
关键词
large language models; efficient inference offloading; mixture-of-experts; Internet of Medical Things;
D O I
10.3390/electronics13112077
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite recent significant advancements in large language models (LLMs) for medical services, the deployment difficulties of LLMs in e-healthcare hinder complex medical applications in the Internet of Medical Things (IoMT). People are increasingly concerned about e-healthcare risks and privacy protection. Existing LLMs face difficulties in providing accurate medical questions and answers (Q&As) and meeting the deployment resource demands in the IoMT. To address these challenges, we propose MedMixtral 8x7B, a new medical LLM based on the mixture-of-experts (MoE) architecture with an offloading strategy, enabling deployment on the IoMT, improving the privacy protection for users. Additionally, we find that the significant factors affecting latency include the method of device interconnection, the location of offloading servers, and the speed of the disk.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation
    Chen, Qian
    Zhu, Lei
    He, Hangzhou
    Zhang, Xinliang
    Zeng, Shuang
    Ren, Qiushi
    Lu, Yanye
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VIII, 2024, 15008 : 382 - 392
  • [22] Adaptive mixture-of-experts models for data glove interface with multiple users
    Yoon, Jong-Won
    Yang, Sung-Ihk
    Cho, Sung-Bae
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (05) : 4898 - 4907
  • [23] Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models
    Liu, Juncai
    Wang, Jessie Hui
    Jiang, Yimin
    PROCEEDINGS OF THE 2023 ACM SIGCOMM 2023 CONFERENCE, SIGCOMM 2023, 2023, : 486 - 498
  • [24] Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
    Yao, Jinghan
    Anthony, Quentin
    Shafi, Aamir
    Subramoni, Hari
    Panda, Dhabaleswar K.
    PROCEEDINGS 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS 2024, 2024, : 915 - 925
  • [25] AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation
    Jawahar, Ganesh
    Mukherjee, Subhabrata
    Li, Xiaodong
    Kim, Young Jin
    Abdul-Mageed, Muhammad
    Lakshmanan, Laks V. S.
    Awadallah, Ahmed Hassan
    Bubeck, Sebastien
    Gao, Jianfeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9116 - 9132
  • [26] Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
    Ryabinin, Max
    Gusev, Anton
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [27] Efficient scaling of large language models with mixture of experts and 3D analog in-memory computing
    Buchel, Julian
    Vasilopoulos, Athanasios
    Simon, William Andrew
    Boybat, Irem
    Tsai, Hsinyu
    Burr, Geoffrey W.
    Castro, Hernan
    Filipiak, Bill
    Le Gallo, Manuel
    Rahimi, Abbas
    Narayanan, Vijay
    Sebastian, Abu
    NATURE COMPUTATIONAL SCIENCE, 2025, 5 (01): : 13 - 26
  • [28] Bayesian shrinkage in mixture-of-experts models: identifying robust determinants of class membership
    Gregor Zens
    Advances in Data Analysis and Classification, 2019, 13 : 1019 - 1051
  • [30] Improving risk classification and ratemaking using mixture-of-experts models with random effects
    Tseung, Spark C.
    Chan, Ian Weng
    Fung, Tsz Chai
    Badescu, Andrei L.
    Lin, X. Sheldon
    JOURNAL OF RISK AND INSURANCE, 2023, 90 (03) : 789 - 820