Training Large-Scale Foundation Models on Emerging AI Chips

被引：3

作者：

Muhamed, Aashiq ^{[1
]}

Bock, Christian ^{[2
]}

Solanki, Rahul ^{[3
]}

Park, Youngsuk ^{[1
]}

Wang, Yida ^{[1
]}

Huan, Jun ^{[1
]}

机构：

[1] AWS AI Labs, Santa Clara, CA 94565 USA

[2] AWS AI Labs, Munich, Germany

[3] AWS Neuron, Cupertino, CA USA

来源：

PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年

关键词：

AI accelerator; foundation models; TPU; GPU; Trainium;

D O I：

10.1145/3580305.3599573

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Foundation models such as ChatGPT and GPT-4 have garnered significant interest from both academia and industry due to their emergent capabilities, such as few-shot prompting, multi-step reasoning, and instruction following. Such capabilities were previously only attainable with specially designed models, such as those using knowledge graphs, but can now be achieved on a much larger scale with foundation models. As the capabilities of foundation models have increased, so too have their sizes at a rate much faster than Moore's law. For example, the BERT large model was initially released as a 334M model in 2018, and by 2023, the largest GPT-4 models are estimated to range between 200-300B, representing an increase of three orders of magnitude in just five years. The training of foundation models requires massive computing power. For instance, training a BERT model on a single state-of-the-art GPU machine with multi-A100 chips can take several days, while training GPT-3 models on a large multi-instance GPU cluster can take several months to complete the estimated 3 x 10(23) flops. This tutorial provides an overview of the latest progress in supporting foundation model training and inference with new AI chips. It reviews progress on the modeling side, with an emphasis on the transformer architecture, and presents the system architecture supporting training and serving foundation models. This includes programming language frameworks such as PyTorch and Tensor-flow, graph compilers, 3D parallelisms, and accelerators such as the GPU H100, TPU, and Trainium. Finally, the tutorial presents our experience of training foundation models using different systems.

引用

页码：5821 / 5822

页数：2

共 50 条

[1] A Large-Scale Evaluation of Speech Foundation Models
Yang, Shu-wen
Chang, Heng-Jui
Huang, Zili
Liu, Andy T.
Lai, Cheng-, I
Wu, Haibin
Shi, Jiatong
Chang, Xuankai
Tsai, Hsiang-Sheng
Huang, Wen-Chin
Feng, Tzu-hsun
Chi, Po-Han
Lin, Yist Y.
Chuang, Yung-Sung
Huang, Tzu-Hsien
Tseng, Wei-Cheng
Lakhotia, Kushal
Li, Shang-Wen
Mohamed, Abdelrahman
Watanabe, Shinji
Lee, Hung-yi
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2884 - 2899
[2] Layered mixed-precision training: A new training method for large-scale AI models
Li, Hao
Wang, Yuzhu
Hong, Yan
Li, Fei
Ji, Xiaohui
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (08)
[3] On Efficient Training of Large-Scale Deep Learning Models
Shen, Li
Sun, Yan
Yu, Zhiyuan
Ding, Liang
Tian, Xinmei
Tao, Dacheng
ACM COMPUTING SURVEYS, 2025, 57 (03)
[4] FloodCastBench: A Large-Scale Dataset and Foundation Models for Flood Modeling and Forecasting
Xu, Qingsong
Shi, Yilei
Zhao, Jie
Zhu, Xiao Xiang
SCIENTIFIC DATA, 2025, 12 (01)
[5] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wang, Wenhai
Dai, Jifeng
Chen, Zhe
Huang, Zhenhang
Li, Zhiqi
Zhu, Xizhou
Hu, Xiaowei
Lu, Tong
Lu, Lewei
Li, Hongsheng
Wang, Xiaogang
Qiao, Yu
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14408 - 14419
[6] Foundation Models, Generative AI, and Large Language Models
Ross, Angela
McGrow, Kathleen
Zhi, Degui
Rasmy, Laila
CIN-COMPUTERS INFORMATICS NURSING, 2024, 42 (05) : 377 - 387
[7] An editorial of "AI plus informetrics": Robust models for large-scale analytics
Zhang, Yi
Zhang, Chengzhi
Mayr, Philipp
Suominen, Arho
Ding, Ying
INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
[8] AI Accelerator Embedded Computational Storage for Large-Scale DNN Models
Aim, Byungmin
Jang, Jaehun
Na, Hanbyeul
Seo, Mankeun
Son, Hongrak
Song, Yong Ho
2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 483 - 486
[9] Large-scale use of forest chips in Finland
不详
FORESTRY CHRONICLE, 2000, 76 (05): : 717 - 717
[10] The chips are down - microfluidic large-scale integration
Quake, S
TRAC-TRENDS IN ANALYTICAL CHEMISTRY, 2002, 21 (11) : XII - XIII

← 1 2 3 4 5 →