MixRec: Orchestrating Concurrent Recommendation Model Training on CPU-GPU platform

被引：0

作者：

Jiang, Jiazhi ^{[1
]}

Tian, Rui ^{[1
]}

Du, Jiangsu ^{[1
]}

Huang, Dan ^{[1
]}

Lu, Yutong ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Beijing, Peoples R China

来源：

2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD | 2023年

基金：

国家重点研发计划;

关键词：

Scheduling; Recommendation; Concurrent Training; CPU/GPU;

D O I：

10.1109/ICCD58817.2023.00062

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The development of deep learning recommendation models (DLRM) and recommendation systems has significantly improved the precision of information matching. Due to distinct computation, data access, and memory usage characteristics of recommendation models, they may suffer from low resource utilization on prevalent heterogeneous CPU-GPU hardware platforms. Existing concurrent training solutions cannot be directly applied to DLRM due to various factors, such as insufficient fine-grained memory management and the lack of collaborative CPU-GPU scheduling. In this paper, we introduce MixRec, a scheduling framework that addresses these challenges by providing an efficient job management and scheduling mechanism for DLRM training jobs on heterogeneous CPU-GPU platforms. To facilitate training co-location, we first estimate the peak memory consumption of each job. Additionally, we track and collect resource utilization for DLRM training jobs. Based on the information of resource usage, a batched job dispatcher with dynamic resource-complementary scheduling policy is proposed to co-locate DLRM training jobs on CPU-GPU platform. Experimental results demonstrate that our implementation achieved up to 4.42x higher throughput and 3.97x higher resource utilization for training jobs involving various recommendation models.

引用

页码：366 / 374

页数：9

共 27 条

[1] Understanding Training Efficiency of Deep Learning Recommendation Models at Scale [J].

Acun, Bilge ;

Murphy, Matthew ;

Wang, Xiaodong ;

Nie, Jade ;

Wu, Carole-Jean ;

Hazelwood, Kim .

2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, :802-814

[2]

Adnan M, 2024, Arxiv, DOI arXiv:2204.05436

[3] High-Performance Recommender System Training using Co-Clustering on CPU/GPU Clusters [J].

Atasu, Kubilay ;

Parnell, Thomas ;

Dunner, Celestine ;

Vlachos, Michail ;

Pozidis, Haralampos .

2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, :372-381

[4]

Cheng H.-T., 2016, P 1 WORKSH DEEP LEAR, P7, DOI [DOI 10.1145/2988450.2988454, 10.1145/2988450.2988454]

[5] Estimating GPU Memory Consumption of Deep Learning Models [J].

Gao, Yanjie ;

Liu, Yu ;

Zhang, Hongyu ;

Li, Zhengxian ;

Zhu, Yonghao ;

Lin, Haoxiang ;

Yang, Mao .

PROCEEDINGS OF THE 28TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '20), 2020, :1342-1352

[6] RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance [J].

Gupta, Udit ;

Hsia, Samuel ;

Zhang, Jeff ;

Wilkening, Mark ;

Pombra, Javin ;

Lee, Hsien-Hsin S. ;

Wei, Gu-Yeon ;

Wu, Carole-Jean ;

Brooks, David .

PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, :870-884

[7] DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference [J].

Gupta, Udit ;

Hsia, Samuel ;

Saraph, Vikram ;

Wang, Xiaodong ;

Reagen, Brandon ;

Wei, Gu-Yeon ;

Lee, Hsien-Hsin S. ;

Brooks, David ;

Wu, Carole-Jean .

2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), 2020, :982-995

[8] Deep Position-wise Interaction Network for CTR Prediction [J].

Huang, Jianqiang ;

Hu, Ke ;

Tang, Qingtao ;

Chen, Mingjian ;

Qi, Yi ;

Cheng, Jia ;

Lei, Jun .

SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, :1885-1889

[9] Optimizing small channel 3D convolution on GPU with tensor core [J].

Jiang, Jiazhi ;

Huang, Dan ;

Du, Jiangsu ;

Lu, Yutong ;

Liao, Xiangke .

PARALLEL COMPUTING, 2022, 113

[10] Optimizing Deep Learning Recommender Systems Training on CPU Cluster Architectures [J].

Kalamkar, Dhiraj ;

Georganas, Evangelos ;

Srinivasan, Sudarshan ;

Chen, Jianping ;

Shiryaev, Mikhail ;

Heinecke, Alexander .

PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,

← 1 2 3 →