POSTER: Pattern-Aware Sparse Communication for Scalable Recommendation Model Training

被引:2
作者
He, Jiaao [1 ]
Chen, Shengqi [1 ]
Zhai, Jidong [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024 | 2024年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Distributed Deep Learning; Parallelism;
D O I
10.1145/3627535.3638481
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recommendation models are an important category of deep learning models whose size is growing enormous. They consist of a sparse part with TBs of memory footprint and a dense part that demands PFLOPs of computing capability to train. Unfortunately, the high sparse communication cost to re-organize data for different parallel strategies of the two parts impedes the scalability in training. Based on observations of sparse access patterns, we design a two-fold fine-grained parallel strategy to accelerate sparse communication. A performance model is built to select an optimal set of items that are replicated across all GPUs so that all-to-all communication volume is reduced, while keeping memory consumption acceptable. The all-to-all overhead is further reduced by parallel scheduling techniques. In our evaluation on 32 GPUs over real-world datasets, 2.16- 16.8x end-to-end speedup is achieved over the baselines.
引用
收藏
页码:466 / 468
页数:3
相关论文
共 12 条
[1]  
Alimama, 2020, Ad Display/Click Data on Taobao.com
[2]  
Criteo, 2014, Criteo Display Advertising Challenge
[3]  
Criteo, 2014, Criteo 1TB Click Logs Dataset
[4]   TorchRec: a PyTorch Domain Library for Recommendation Systems [J].
Ivchenko, Dmytro ;
van der Staay, Dennis ;
Taylor, Colin ;
Liu, Xing ;
Feng, Will ;
Kindi, Rahul ;
Sudarshan, Anirudh ;
Sefati, Shahin .
PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, :482-483
[5]   Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks [J].
Kim, Soojeong ;
Yu, Gyeong-In ;
Park, Hojin ;
Cho, Sungwoo ;
Jeong, Eunji ;
Ha, Hyeonmin ;
Lee, Sanha ;
Jeong, Joo Seong ;
Chun, Byung-Gon .
PROCEEDINGS OF THE FOURTEENTH EUROSYS CONFERENCE 2019 (EUROSYS '19), 2019,
[6]   PERSIA: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters [J].
Lian, Xiangru ;
Yuan, Binhang ;
Zhu, Xuefeng ;
Wang, Yulong ;
He, Yongjun ;
Wu, Honghuan ;
Sun, Lei ;
Lyu, Haodong ;
Liu, Chengjun ;
Dong, Xing ;
Liao, Yiqiao ;
Luo, Mingnan ;
Zhang, Congfei ;
Xie, Jingru ;
Li, Haonan ;
Chen, Lei ;
Huang, Renjie ;
Lin, Jianying ;
Shu, Chengchun ;
Qiu, Xuezhong ;
Liu, Zhishan ;
Kong, Dongying ;
Yuan, Lei ;
Yu, Hai ;
Yang, Sen ;
Zhang, Ce ;
Liu, Ji .
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, :3288-3298
[7]   Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models [J].
Mudigere, Dheevatsa ;
Hao, Yuchen ;
Huang, Jianyu ;
Jia, Zhihao ;
Tulloch, Andrew ;
Sridharan, Srinivas ;
Liu, Xing ;
Ozdal, Mustafa ;
Nie, Jade ;
Park, Jongsoo ;
Luo, Liang ;
Yang, Jie ;
Gao, Leon ;
Ivchenko, Dmytro ;
Basant, Aarti ;
Hu, Yuxi ;
Yang, Jiyan ;
Ardestani, Ehsan K. ;
Wang, Xiaodong ;
Komuravelli, Rakesh ;
Chu, Ching-Hsiang ;
Yilmaz, Serhat ;
Li, Huayu ;
Qian, Jiyuan ;
Feng, Zhuobo ;
Ma, Yinbin ;
Yang, Junjie ;
Wen, Ellie ;
Li, Hong ;
Yang, Lin ;
Sun, Chonglin ;
Zhao, Whitney ;
Melts, Dimitry ;
Dhulipala, Krishna ;
Kishore, K. R. ;
Graf, Tyler ;
Eisenman, Assaf ;
Matam, Kiran Kumar ;
Gangidi, Adi ;
Chen, Guoqiang Jerry ;
Krishnan, Manoj ;
Nayak, Avinash ;
Nair, Krishnakumar ;
Muthiah, Bharath ;
Khorashadi, Mahmoud ;
Bhattacharya, Pallab ;
Lapukhov, Petr ;
Naumov, Maxim ;
Mathews, Ajit ;
Qiao, Lin .
PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22), 2022, :993-1011
[8]  
Naumov M, 2019, arXiv, DOI DOI 10.48550/ARXIV.1906.00091
[9]   RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation [J].
Sethi, Geet ;
Acun, Bilge ;
Agarwal, Niket ;
Kozyrakis, Christos ;
Trippel, Caroline ;
Wu, Carole-Jean .
ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2022, :344-358
[10]  
Verma Shashank, 2022, Fast, Terabyte-Scale Recommender Training Made Easy with NVIDIA Merlin Distributed-Embeddings