HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork

被引：0

作者：

Do, Giang ^{[2
]}

Le, Khiem ^{[3
]}

Pham, Quang ^{[1
]}

TrungTin Nguyen ^{[4
]}

Thanh-Nam Doan

Nguyen, Binh T. ^{[5
]}

Liu, Chenghao ^{[6
]}

Ramasamy, Savitha ^{[1
]}

Li, Xiaoli ^{[1
]}

Hoi, Steven ^{[7
]}

机构：

[1] ASTAR, Inst Infocomm Res I2R, Singapore, Singapore

[2] Univ Tennessee, Chattanooga, TN USA

[3] VinUniv, Hanoi, Vietnam

[4] Univ Grenoble Alpes, CNRS, Inria, Grenoble INP,LJK, F-38000 Grenoble, France

[5] Vietnam Natl Univ Ho Chi Minh City, Univ Sci, AISIA Lab, Ho Chi Minh City, Vietnam

[6] Salesforce Res Asia, Beijing, Peoples R China

[7] Singapore Management Univ, Singapore, Singapore

来源：

2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023 | 2023年

基金：

新加坡国家研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models. Recent findings suggest that fixing the routers can achieve competitive performance by alleviating the collapsing problem, where all experts eventually learn similar representations. However, this strategy has two key limitations: (i) the policy derived from random routers might be suboptimal, and (ii) it requires extensive resources during training and evaluation, leading to limited efficiency gains. This work introduces HyperRouter, which dynamically generates the router's parameters through a fixed hypernetwork and trainable embeddings to achieve a balance between training the routers and freezing them to learn an improved routing policy. Extensive experiments across a wide range of tasks demonstrate the superior performance and efficiency gains of HyperRouter compared to existing routing methods. Our implementation is publicly available at https://github.com/giangdip2410/HyperRouter.

引用

页码：5754 / 5765

页数：12

共 34 条

[1] Sparse Bayesian Hierarchical Mixture of Experts and Variational Inference
Iikubo, Yuji
Horii, Shunsuke
Matsushima, Toshiyasu
PROCEEDINGS OF 2018 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA2018), 2018, : 60 - 64
[2] Efficient Routing in Sparse Mixture-of-Experts
Shamsolmoali, Pourya (pshams55@gmail.com), 1600, Institute of Electrical and Electronics Engineers Inc.
[3] MoESys: A Distributed and Efficient Mixture-of-Experts Training and Inference System for Internet Services
Yu, Dianhai
Shen, Liang
Hao, Hongxiang
Gong, Weibao
Wu, Huachao
Bian, Jiang
Dai, Lirong
Xiong, Haoyi
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (05) : 2626 - 2639
[4] Sparse Mixture of Local Experts for Efficient Speech Enhancement
Sivaraman, Aswin
Kim, Minje
INTERSPEECH 2020, 2020, : 4526 - 4530
[5] HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
Zhao, Hao
Qiu, Zihan
Wu, Huijia
Wang, Zili
He, Zhaofeng
Fu, Jie
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 10605 - 10618
[6] REGULARIZED GRADIENT DESCENT TRAINING OF STEERED MIXTURE OF EXPERTS FOR SPARSE IMAGE REPRESENTATION
Bochinski, Erik
Jongebloed, Rolf
Tok, Michael
Sikora, Thomas
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 3873 - 3877
[7] Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models
Liu, Juncai
Wang, Jessie Hui
Jiang, Yimin
PROCEEDINGS OF THE 2023 ACM SIGCOMM 2023 CONFERENCE, SIGCOMM 2023, 2023, : 486 - 498
[8] Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Kudugunta, Sneha
Huang, Yanping
Bapna, Ankur
Krikun, Maxim
Lepikhin, Dmitry
Thang Luong
Firat, Orhan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3577 - 3599
[9] DisPFL: Towards Communication-Efficient Personalized Federated Learning via Decentralized Sparse Training
Dai, Rong
Shen, Li
He, Fengxiang
Tian, Xinmei
Tao, Dacheng
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[10] Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things
Yuan, Xiaoming
Kong, Weixuan
Luo, Zhenyu
Xu, Minrui
ELECTRONICS, 2024, 13 (11)

← 1 2 3 4 →