Efficient scaling of large language models with mixture of experts and 3D analog in-memory computing

被引：1

作者：

Buchel, Julian ^{[1
]}

Vasilopoulos, Athanasios ^{[1
]}

Simon, William Andrew ^{[1
]}

Boybat, Irem ^{[1
]}

Tsai, Hsinyu ^{[2
]}

Burr, Geoffrey W. ^{[2
]}

Castro, Hernan ^{[3
]}

Filipiak, Bill ^{[4
]}

Le Gallo, Manuel ^{[1
]}

Rahimi, Abbas ^{[1
]}

Narayanan, Vijay ^{[5
]}

Sebastian, Abu ^{[1
]}

机构：

[1] IBM Res Europe, Ruschlikon, Switzerland

[2] IBM Res Almaden, San Jose, CA USA

[3] Micron Technol, Folsom, CA USA

[4] Micron Technol, Novi, MI USA

[5] IBM Thomas J Watson Res Ctr, Yorktown Hts, NY USA

来源：

NATURE COMPUTATIONAL SCIENCE | 2025年 / 5卷 / 01期

关键词：

MEMRISTOR; CHIP;

D O I：

10.1038/s43588-024-00753-x

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Large language models (LLMs), with their remarkable generative capacities, have greatly impacted a range of fields, but they face scalability challenges due to their large parameter counts, which result in high costs for training and inference. The trend of increasing model sizes is exacerbating these challenges, particularly in terms of memory footprint, latency and energy consumption. Here we explore the deployment of 'mixture of experts' (MoEs) networks-networks that use conditional computing to keep computational demands low despite having many parameters-on three-dimensional (3D) non-volatile memory (NVM)-based analog in-memory computing (AIMC) hardware. When combined with the MoE architecture, this hardware, utilizing stacked NVM devices arranged in a crossbar array, offers a solution to the parameter-fetching bottleneck typical in traditional models deployed on conventional von-Neumann-based architectures. By simulating the deployment of MoEs on an abstract 3D AIMC system, we demonstrate that, due to their conditional compute mechanism, MoEs are inherently better suited to this hardware than conventional, dense model architectures. Our findings suggest that MoEs, in conjunction with emerging 3D NVM-based AIMC, can substantially reduce the inference costs of state-of-the-art LLMs, making them more accessible and energy-efficient.

引用

页码：13 / 26

页数：22

共 50 条

[21] In-Memory 3D NAND Flash Hyperdimensional Computing Engine for Energy-Efficient SARS-CoV-2 Genome Sequencing
Hsu, Po-Kai
Yu, Shimeng
2022 14TH IEEE INTERNATIONAL MEMORY WORKSHOP (IMW 2022), 2022, : 65 - 68
[22] 3D-LLM: Injecting the 3D World into Large Language Models
Hong, Yining
Zhen, Haoyu
Chen, Peihao
Zheng, Shuhong
Du, Yilun
Chen, Zhenfang
Gan, Chuang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36, NEURIPS 2023, 2023,
[23] Impact of Random Phase Distribution in 3D Vertical NAND Architecture of Ferroelectric Transistors on In-Memory Computing
Choe, Gihun
Shim, Wonbo
Hur, Jae
Khan, Asif Islam
Yu, Shimeng
2020 INTERNATIONAL CONFERENCE ON SIMULATION OF SEMICONDUCTOR PROCESSES AND DEVICES (SISPAD 2020), 2020, : 165 - 168
[24] An In-Memory Analog Computing Co-Processor for Energy-Efficient CNN Inference on Mobile Devices
Elbtity, Mohammed
Singh, Abhishek
Reidy, Brendan
Guo, Xiaochen
Zand, Ramtin
2021 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2021), 2021, : 188 - 193
[25] Memory-efficient 3D connected component labeling with parallel computing
Norihiro Ohira
Signal, Image and Video Processing, 2018, 12 : 429 - 436
[26] Memory-efficient 3D connected component labeling with parallel computing
Ohira, Norihiro
SIGNAL IMAGE AND VIDEO PROCESSING, 2018, 12 (03) : 429 - 436
[27] Monolithic 3D Integration of Analog RRAM-Based Computing-in-Memory and Sensor for Energy-Efficient Near-Sensor Computing
Du, Yiwei
Tang, Jianshi
Li, Yijun
Xi, Yue
Li, Yuankun
Li, Jiaming
Huang, Heyi
Qin, Qi
Zhang, Qingtian
Gao, Bin
Deng, Ning
Qian, He
Wu, Huaqiang
ADVANCED MATERIALS, 2024, 36 (22)
[28] 3D Building Generation in Minecraft via Large Language Models
Hu, Shiying
Huang, Zengrong
Hu, Chengpeng
Liu, Jialin
2024 IEEE CONFERENCE ON GAMES, COG 2024, 2024,
[29] In-Memory Management System for 3D Protein Macromolecular Structures
Malysiak-Mrozek, Bozena
Zur, Kamil
Mrozek, Dariusz
CURRENT PROTEOMICS, 2018, 15 (03) : 175 - 189
[30] HYBRID COMPUTING MODELS FOR LARGE-SCALE HETEROGENEOUS 3D MICROSTRUCTURES
Schrader, Kai
Koenke, Carsten
INTERNATIONAL JOURNAL FOR MULTISCALE COMPUTATIONAL ENGINEERING, 2011, 9 (04) : 365 - 377

← 1 2 3 4 5 →