Efficient scaling of large language models with mixture of experts and 3D analog in-memory computing

被引：1

作者：

Buchel, Julian ^{[1
]}

Vasilopoulos, Athanasios ^{[1
]}

Simon, William Andrew ^{[1
]}

Boybat, Irem ^{[1
]}

Tsai, Hsinyu ^{[2
]}

Burr, Geoffrey W. ^{[2
]}

Castro, Hernan ^{[3
]}

Filipiak, Bill ^{[4
]}

Le Gallo, Manuel ^{[1
]}

Rahimi, Abbas ^{[1
]}

Narayanan, Vijay ^{[5
]}

Sebastian, Abu ^{[1
]}

机构：

[1] IBM Res Europe, Ruschlikon, Switzerland

[2] IBM Res Almaden, San Jose, CA USA

[3] Micron Technol, Folsom, CA USA

[4] Micron Technol, Novi, MI USA

[5] IBM Thomas J Watson Res Ctr, Yorktown Hts, NY USA

来源：

NATURE COMPUTATIONAL SCIENCE | 2025年 / 5卷 / 01期

关键词：

MEMRISTOR; CHIP;

D O I：

10.1038/s43588-024-00753-x

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Large language models (LLMs), with their remarkable generative capacities, have greatly impacted a range of fields, but they face scalability challenges due to their large parameter counts, which result in high costs for training and inference. The trend of increasing model sizes is exacerbating these challenges, particularly in terms of memory footprint, latency and energy consumption. Here we explore the deployment of 'mixture of experts' (MoEs) networks-networks that use conditional computing to keep computational demands low despite having many parameters-on three-dimensional (3D) non-volatile memory (NVM)-based analog in-memory computing (AIMC) hardware. When combined with the MoE architecture, this hardware, utilizing stacked NVM devices arranged in a crossbar array, offers a solution to the parameter-fetching bottleneck typical in traditional models deployed on conventional von-Neumann-based architectures. By simulating the deployment of MoEs on an abstract 3D AIMC system, we demonstrate that, due to their conditional compute mechanism, MoEs are inherently better suited to this hardware than conventional, dense model architectures. Our findings suggest that MoEs, in conjunction with emerging 3D NVM-based AIMC, can substantially reduce the inference costs of state-of-the-art LLMs, making them more accessible and energy-efficient.

引用

页码：13 / 26

页数：22

共 50 条

[31] Exploring the Feasibility of Using 3-D XPoint as an In-Memory Computing Accelerator
Zabihi, Masoud
Resch, Salonik
Cilasun, Husrev
Chowdhury, Zamshed, I
Zhao, Zhengyang
Karpuzcu, Ulya R.
Wang, Jian-Ping
Sapatnekar, Sachin S.
IEEE JOURNAL ON EXPLORATORY SOLID-STATE COMPUTATIONAL DEVICES AND CIRCUITS, 2021, 7 (02): : 88 - 96
[32] Anisotropic scaling for 3D topological models
Rufo, S.
Griffith, M. A. R.
Lopes, Nei
Continentino, Mucio A.
SCIENTIFIC REPORTS, 2021, 11 (01)
[33] A Heterogeneous Platform for 3D NAND-Based In-Memory Hyperdimensional Computing Engine for Genome Sequencing Applications
Hsu, Po-Kai
Garg, Vaidehi
Lu, Anni
Yu, Shimeng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (04) : 1628 - 1637
[34] Anisotropic scaling for 3D topological models
S. Rufo
M. A. R. Griffith
Nei Lopes
Mucio A. Continentino
Scientific Reports, 11
[35] Overcoming language barriers via machine translation with sparse Mixture-of-Experts fusion of large language models
Zhu, Shaolin
Jian, Dong
Xiong, Deyi
INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (03)
[36] 2D Molecular Ferroelectric with Large Out-of-plane Polarization for In-Memory Computing
Yao, Jie
Feng, Zi-Jie
Hu, Zhenliang
Xiong, Yu-An
Pan, Qiang
Du, Guo-Wei
Ji, Hao-Ran
Sha, Tai-Ting
Lu, Junpeng
You, Yu-Meng
ADVANCED FUNCTIONAL MATERIALS, 2024, 34 (22)
[37] Discovering the In-Memory Kernels of 3D Dot-Product Engines
Rashed, Muhammad Rashedul Haq
Jha, Sumit Kumar
Ewetz, Rickard
2023 28TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC, 2023, : 240 - 245
[38] GP3D: 3D NAND Based In-Memory Graph Processing Accelerator
Shim, Wonbo
Yu, Shimeng
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2022, 12 (02) : 500 - 507
[39] Efficient Ray Tracing of Large 3D Scenes for Mobile Distributed Computing Environments
Seo, Woong
Park, Sanghun
Ihm, Insung
SENSORS, 2022, 22 (02)
[40] ICE: An Intelligent Cognition Engine with 3D NAND-based In-Memory Computing for Vector Similarity Search Acceleration
Hu, Han-Wen
Wang, Wei-Chen
Chang, Yuan-Hao
Lee, Yung-Chun
Lin, Bo-Rong
Wang, Huai -Mu
Lin, Yen-Po
Huang, Yu -Ming
Lee, Chong-Ying
Su, Tzu-Hsiang
Hsieh, Chih-Chang
Hu, Chia -Ming
Lai, Yi-Ting
Chen, Chung-Kuang
Chen, Han -Sung
Li, Hsiang -Pang
Kuo, Tei-Wei
Chang, Meng -Fan
Wang, Keh-Chung
Hung, Chun-Hsiung
Lu, Chih-Yuan
2022 55TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2022, : 763 - 783

← 1 2 3 4 5 →