AI Computing Systems for Large Language Models Training

被引：0

作者：

Zhang, Zhen-Xing ^{[1
,2
]}

Wen, Yuan-Bo ^{[2
]}

Lyu, Han-Qi ^{[1
,2
,3
]}

Liu, Chang ^{[3
]}

Zhang, Rui ^{[2
]}

Li, Xia-Qing ^{[2
]}

Wang, Chao ^{[1
]}

Du, Zi-Dong ^{[2
,4
]}

Guo, Qi ^{[2
]}

Li, Ling ^{[5
]}

Zhou, Xue-Hai ^{[1
]}

Chen, Yun-Ji ^{[2
,6
]}

机构：

[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230026, Peoples R China

[2] Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing 100190, Peoples R China

[3] Cambricon Technol, Beijing 100191, Peoples R China

[4] Shanghai Innovat Ctr Processor Technol, Shanghai 201210, Peoples R China

[5] Chinese Acad Sci, Inst Software, Intelligent Software Res Ctr, Beijing 100190, Peoples R China

[6] Univ Chinese Acad Sci, Beijing 101408, Peoples R China

来源：

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY | 2025年 / 40卷 / 01期

基金：

中国国家自然科学基金;

关键词：

artificial intelligence (AI) chip; large language model (LLM); AI computing system; accelerator; EFFICIENT;

D O I：

10.1007/s11390-024-4178-1

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we present a comprehensive overview of artificial intelligence (AI) computing systems for large language models (LLMs) training. The rapid advancement of LLMs in recent years, coupled with the widespread adoption of algorithms and applications such as BERT, ChatGPT, and DeepSeek, has sparked significant interest in this field. We classify LLMs into encoder-only, encoder-decoder, and decoder-only models, and briefly analyze their training and inference processes to emphasize their substantial need for computational resources. These operations depend heavily on AI-specific accelerators like GPUs (graphics processing units), TPUs (tensor processing units), and MLUs (machine learning units). However, as the gap widens between the increasing complexity of LLMs and the current capabilities of accelerators, it becomes essential to adopt heterogeneous computing systems optimized for distributed environments to manage the growing computational and memory requirements of LLMs. We delve into the execution and scheduling of LLM algorithms, underlining the critical role of distributed computing strategies, memory management enhancements, and boosting computational efficiency. This paper clarifies the complex relationship between algorithm design, hardware infrastructure, and software optimization, and provides an in-depth understanding of both the software and hardware infrastructure supporting LLMs training, offering insights into the challenges and potential avenues for future development and deployment.

引用

页码：6 / 41

页数：36

共 50 条

[31] Large Language Models as Recommendation Systems in Museums
Trichopoulos, Georgios
Konstantakis, Markos
Alexandridis, Georgios
Caridakis, George
ELECTRONICS, 2023, 12 (18)
[32] AI as Agency Without Intelligence: on ChatGPT, Large Language Models, and Other Generative Models
Luciano Floridi
Philosophy & Technology, 2023, 36 (1)
[33] Human bias in AI models? Anchoring effects and mitigation strategies in large language models
Nguyen, Jeremy K.
JOURNAL OF BEHAVIORAL AND EXPERIMENTAL FINANCE, 2024, 43
[34] A journey into the Generative AI and large language models: From NLP to BioInformatics
Elnaggar, Ahmed
INTERNATIONAL CONFERENCE ON GRAMMATICAL INFERENCE, VOL 217, 2023, 217 : 7 - 7
[35] Use of AI Methods with Large Language Models in Requirements and Test Engineering
Dollinger, Stefan, 2025, 127 (01) : 54 - 57
[36] Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI
Hao, Jiangang
von Davier, Alina A.
Yaneva, Victoria
Lottridge, Susan
von Davier, Matthias
Harris, Deborah J.
EDUCATIONAL MEASUREMENT-ISSUES AND PRACTICE, 2024, 43 (02) : 16 - 29
[37] AI am a rheumatologist: a practical primer to large language models for rheumatologists
Ray, Partha Pratim
RHEUMATOLOGY, 2023, 63 (11) : e315 - e316
[38] The imperative for regulatory oversight of large language models (or generative AI) in healthcare
Mesko, Bertalan
Topol, Eric J. J.
NPJ DIGITAL MEDICINE, 2023, 6 (01)
[39] Practical Application of AI and Large Language Models in Software Engineering Education
Kozov, Vasil
Ivanova, Galina
Atanasova, Desislava
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 690 - 696
[40] Navigating the Evolution: The Rising Tide of Large Language Models for AI and Education
Clark, Peter
ARTIFICIAL INTELLIGENCE IN EDUCATION: POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS, DOCTORAL CONSORTIUM AND BLUE SKY, AIED 2024, PT I, 2024, 2150 : XXXI - XXXIV

← 1 2 3 4 5 →