Training Large-Scale Foundation Models on Emerging AI Chips

被引:3
|
作者
Muhamed, Aashiq [1 ]
Bock, Christian [2 ]
Solanki, Rahul [3 ]
Park, Youngsuk [1 ]
Wang, Yida [1 ]
Huan, Jun [1 ]
机构
[1] AWS AI Labs, Santa Clara, CA 94565 USA
[2] AWS AI Labs, Munich, Germany
[3] AWS Neuron, Cupertino, CA USA
关键词
AI accelerator; foundation models; TPU; GPU; Trainium;
D O I
10.1145/3580305.3599573
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Foundation models such as ChatGPT and GPT-4 have garnered significant interest from both academia and industry due to their emergent capabilities, such as few-shot prompting, multi-step reasoning, and instruction following. Such capabilities were previously only attainable with specially designed models, such as those using knowledge graphs, but can now be achieved on a much larger scale with foundation models. As the capabilities of foundation models have increased, so too have their sizes at a rate much faster than Moore's law. For example, the BERT large model was initially released as a 334M model in 2018, and by 2023, the largest GPT-4 models are estimated to range between 200-300B, representing an increase of three orders of magnitude in just five years. The training of foundation models requires massive computing power. For instance, training a BERT model on a single state-of-the-art GPU machine with multi-A100 chips can take several days, while training GPT-3 models on a large multi-instance GPU cluster can take several months to complete the estimated 3 x 10(23) flops. This tutorial provides an overview of the latest progress in supporting foundation model training and inference with new AI chips. It reviews progress on the modeling side, with an emphasis on the transformer architecture, and presents the system architecture supporting training and serving foundation models. This includes programming language frameworks such as PyTorch and Tensor-flow, graph compilers, 3D parallelisms, and accelerators such as the GPU H100, TPU, and Trainium. Finally, the tutorial presents our experience of training foundation models using different systems.
引用
收藏
页码:5821 / 5822
页数:2
相关论文
共 50 条
  • [1] A Large-Scale Evaluation of Speech Foundation Models
    Yang, Shu-wen
    Chang, Heng-Jui
    Huang, Zili
    Liu, Andy T.
    Lai, Cheng-, I
    Wu, Haibin
    Shi, Jiatong
    Chang, Xuankai
    Tsai, Hsiang-Sheng
    Huang, Wen-Chin
    Feng, Tzu-hsun
    Chi, Po-Han
    Lin, Yist Y.
    Chuang, Yung-Sung
    Huang, Tzu-Hsien
    Tseng, Wei-Cheng
    Lakhotia, Kushal
    Li, Shang-Wen
    Mohamed, Abdelrahman
    Watanabe, Shinji
    Lee, Hung-yi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2884 - 2899
  • [2] Layered mixed-precision training: A new training method for large-scale AI models
    Li, Hao
    Wang, Yuzhu
    Hong, Yan
    Li, Fei
    Ji, Xiaohui
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (08)
  • [3] On Efficient Training of Large-Scale Deep Learning Models
    Shen, Li
    Sun, Yan
    Yu, Zhiyuan
    Ding, Liang
    Tian, Xinmei
    Tao, Dacheng
    ACM COMPUTING SURVEYS, 2025, 57 (03)
  • [4] FloodCastBench: A Large-Scale Dataset and Foundation Models for Flood Modeling and Forecasting
    Xu, Qingsong
    Shi, Yilei
    Zhao, Jie
    Zhu, Xiao Xiang
    SCIENTIFIC DATA, 2025, 12 (01)
  • [5] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
    Wang, Wenhai
    Dai, Jifeng
    Chen, Zhe
    Huang, Zhenhang
    Li, Zhiqi
    Zhu, Xizhou
    Hu, Xiaowei
    Lu, Tong
    Lu, Lewei
    Li, Hongsheng
    Wang, Xiaogang
    Qiao, Yu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14408 - 14419
  • [6] Foundation Models, Generative AI, and Large Language Models
    Ross, Angela
    McGrow, Kathleen
    Zhi, Degui
    Rasmy, Laila
    CIN-COMPUTERS INFORMATICS NURSING, 2024, 42 (05) : 377 - 387
  • [7] An editorial of "AI plus informetrics": Robust models for large-scale analytics
    Zhang, Yi
    Zhang, Chengzhi
    Mayr, Philipp
    Suominen, Arho
    Ding, Ying
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [8] AI Accelerator Embedded Computational Storage for Large-Scale DNN Models
    Aim, Byungmin
    Jang, Jaehun
    Na, Hanbyeul
    Seo, Mankeun
    Son, Hongrak
    Song, Yong Ho
    2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 483 - 486
  • [9] Large-scale use of forest chips in Finland
    不详
    FORESTRY CHRONICLE, 2000, 76 (05): : 717 - 717