Accelerating Deep Learning Systems via Critical Set Identification and Model Compression

被引：11

作者：

Han, Rui ^{[1
]}

Liu, Chi Harold ^{[1
]}

Li, Shilin ^{[1
]}

Wen, Shilin ^{[1
]}

Liu, Xue ^{[2
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100811, Peoples R China

[2] McGill Univ, Sch Comp Sci, Montreal, PQ H3A 0G4, Canada

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2020年 / 69卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Training; Data models; Computational modeling; Acceleration; Synchronization; Monte Carlo methods; Deep learning; massive datasets; distributed systems; redundant input data;

D O I：

10.1109/TC.2020.2970917

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern distributed engines are increasingly deployed to accelerate large-scaled deep learning (DL) training jobs. While the parallelism of distributed workers/nodes promises the scalability, the computation and communication overheads of the underlying iterative solving algorithms, e.g., stochastic gradient decent, unfortunately become the bottleneck for distributed DL training jobs. Existing approaches address such limitations by designing more efficient synchronization algorithms and model compressing techniques, but do not adequately address issues relating to processing massive datasets. In this article, we propose ClipDL, which accelerates the deep learning systems by simultaneously decreasing the number of model parameters as well as reducing the computations on critical data only. The core component of ClipDL is the estimation of critical set based on the observation that large proportions of input data have little influence on model parameter updating in many prevalent DL algorithms. We implemented ClipDL on Spark (a popular distributed engine for big data) and BigDL (based on de-factor distributed DL training architecture, parameter server), and integrated it with representative model compression techniques. The exhaustive experiments on real DL applications and datasets show ClipDL accelerates model training process by an average of 2.32 times while only incurring accuracy losses of 1.86 percent.

引用

页码：1059 / 1070

页数：12

共 57 条

[1]

Agarwal Sameer, 2013, P 8 ACM EUR C COMP S, P29

[2]

Alistarh D, 2017, ADV NEUR IN, V30

[3] Is "Good Enough" Computing Good Enough? [J].

不详 .

COMMUNICATIONS OF THE ACM, 2015, 58 (05) :12-14

[4]

[Anonymous], 2019, DISTR DEEP LEARN AP

[5]

[Anonymous], 2010, Proceedings of the 7th USENIX conference on Networked systems design and implementation, NSDI'10

[6]

[Anonymous], 2018, WESAD DATASET

[7]

[Anonymous], 2009, CIFAR 10 DATABASE

[8]

Bengio Yoshua, 2012, Neural Networks: Tricks of the Trade. Second Edition: LNCS 7700, P437, DOI 10.1007/978-3-642-35289-8_26

[9]

Canevet O., 2016, INT C MACH LEARN, P1454

[10]

Chen C, 2019, IEEE INFOCOM SER, P532, DOI [10.1109/INFOCOM.2019.8737587, 10.1109/infocom.2019.8737587]

← 1 2 3 4 5 6 →