Efficient scaling of large language models with mixture of experts and 3D analog in-memory computing

被引:1
|
作者
Buchel, Julian [1 ]
Vasilopoulos, Athanasios [1 ]
Simon, William Andrew [1 ]
Boybat, Irem [1 ]
Tsai, Hsinyu [2 ]
Burr, Geoffrey W. [2 ]
Castro, Hernan [3 ]
Filipiak, Bill [4 ]
Le Gallo, Manuel [1 ]
Rahimi, Abbas [1 ]
Narayanan, Vijay [5 ]
Sebastian, Abu [1 ]
机构
[1] IBM Res Europe, Ruschlikon, Switzerland
[2] IBM Res Almaden, San Jose, CA USA
[3] Micron Technol, Folsom, CA USA
[4] Micron Technol, Novi, MI USA
[5] IBM Thomas J Watson Res Ctr, Yorktown Hts, NY USA
来源
NATURE COMPUTATIONAL SCIENCE | 2025年 / 5卷 / 01期
关键词
MEMRISTOR; CHIP;
D O I
10.1038/s43588-024-00753-x
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Large language models (LLMs), with their remarkable generative capacities, have greatly impacted a range of fields, but they face scalability challenges due to their large parameter counts, which result in high costs for training and inference. The trend of increasing model sizes is exacerbating these challenges, particularly in terms of memory footprint, latency and energy consumption. Here we explore the deployment of 'mixture of experts' (MoEs) networks-networks that use conditional computing to keep computational demands low despite having many parameters-on three-dimensional (3D) non-volatile memory (NVM)-based analog in-memory computing (AIMC) hardware. When combined with the MoE architecture, this hardware, utilizing stacked NVM devices arranged in a crossbar array, offers a solution to the parameter-fetching bottleneck typical in traditional models deployed on conventional von-Neumann-based architectures. By simulating the deployment of MoEs on an abstract 3D AIMC system, we demonstrate that, due to their conditional compute mechanism, MoEs are inherently better suited to this hardware than conventional, dense model architectures. Our findings suggest that MoEs, in conjunction with emerging 3D NVM-based AIMC, can substantially reduce the inference costs of state-of-the-art LLMs, making them more accessible and energy-efficient.
引用
收藏
页码:13 / 26
页数:22
相关论文
共 50 条
  • [21] In-Memory 3D NAND Flash Hyperdimensional Computing Engine for Energy-Efficient SARS-CoV-2 Genome Sequencing
    Hsu, Po-Kai
    Yu, Shimeng
    2022 14TH IEEE INTERNATIONAL MEMORY WORKSHOP (IMW 2022), 2022, : 65 - 68
  • [22] 3D-LLM: Injecting the 3D World into Large Language Models
    Hong, Yining
    Zhen, Haoyu
    Chen, Peihao
    Zheng, Shuhong
    Du, Yilun
    Chen, Zhenfang
    Gan, Chuang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36, NEURIPS 2023, 2023,
  • [23] Impact of Random Phase Distribution in 3D Vertical NAND Architecture of Ferroelectric Transistors on In-Memory Computing
    Choe, Gihun
    Shim, Wonbo
    Hur, Jae
    Khan, Asif Islam
    Yu, Shimeng
    2020 INTERNATIONAL CONFERENCE ON SIMULATION OF SEMICONDUCTOR PROCESSES AND DEVICES (SISPAD 2020), 2020, : 165 - 168
  • [24] An In-Memory Analog Computing Co-Processor for Energy-Efficient CNN Inference on Mobile Devices
    Elbtity, Mohammed
    Singh, Abhishek
    Reidy, Brendan
    Guo, Xiaochen
    Zand, Ramtin
    2021 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2021), 2021, : 188 - 193
  • [25] Memory-efficient 3D connected component labeling with parallel computing
    Norihiro Ohira
    Signal, Image and Video Processing, 2018, 12 : 429 - 436
  • [26] Memory-efficient 3D connected component labeling with parallel computing
    Ohira, Norihiro
    SIGNAL IMAGE AND VIDEO PROCESSING, 2018, 12 (03) : 429 - 436
  • [27] Monolithic 3D Integration of Analog RRAM-Based Computing-in-Memory and Sensor for Energy-Efficient Near-Sensor Computing
    Du, Yiwei
    Tang, Jianshi
    Li, Yijun
    Xi, Yue
    Li, Yuankun
    Li, Jiaming
    Huang, Heyi
    Qin, Qi
    Zhang, Qingtian
    Gao, Bin
    Deng, Ning
    Qian, He
    Wu, Huaqiang
    ADVANCED MATERIALS, 2024, 36 (22)
  • [28] 3D Building Generation in Minecraft via Large Language Models
    Hu, Shiying
    Huang, Zengrong
    Hu, Chengpeng
    Liu, Jialin
    2024 IEEE CONFERENCE ON GAMES, COG 2024, 2024,
  • [29] In-Memory Management System for 3D Protein Macromolecular Structures
    Malysiak-Mrozek, Bozena
    Zur, Kamil
    Mrozek, Dariusz
    CURRENT PROTEOMICS, 2018, 15 (03) : 175 - 189
  • [30] HYBRID COMPUTING MODELS FOR LARGE-SCALE HETEROGENEOUS 3D MICROSTRUCTURES
    Schrader, Kai
    Koenke, Carsten
    INTERNATIONAL JOURNAL FOR MULTISCALE COMPUTATIONAL ENGINEERING, 2011, 9 (04) : 365 - 377