Low-Rank Gradient Descent for Memory-Efficient Training of Deep In-Memory Arrays

被引：1

作者：

Huang, Siyuan ^{[1
]}

Hoskins, Brian D. ^{[2
]}

Daniels, Matthew W. ^{[2
]}

Stiles, Mark D. ^{[2
]}

Adam, Gina C. ^{[3
]}

机构：

[1] George Washington Univ, Dept Comp Sci, Washington, DC 20038 USA

[2] Natl Inst Stand & Technol, Gaithersburg, MD USA

[3] George Washington Univ, Dept Elect & Comp Engn, Washington, DC 20052 USA

来源：

ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS | 2023年 / 19卷 / 02期

关键词：

Deep learning; gradient data decomposition; streaming; principal component analysis;

D O I：

10.1145/3577214

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The movement of large quantities of data during the training of a deep neural network presents immense challenges for machine learning workloads, especially those based on future functional memories deployed to store network models. As the size of network models begins to vastly outstrip traditional silicon computing resources, functional memories based on flash, resistive switches, magnetic tunnel junctions, and other technologies can store these new ultra-large models. However, new approaches are then needed to minimize hardware overhead, especially on the movement and calculation of gradient information that cannot be efficiently contained in these new memory resources. To do this, we introduce streaming batch principal component analysis (SBPCA) as an update algorithm. Streaming batch principal component analysis uses stochastic power iterations to generate a stochastic rank-k approximation of the network gradient. We demonstrate that the low-rank updates produced by streaming batch principal component analysis can effectively train convolutional neural networks on a variety of common datasets, with performance comparable to standard mini-batch gradient descent. Our approximation is made in an expanded vector form that can efficiently be applied to the rows and columns of crossbars for array-level updates. These results promise improvements in the design of application-specific integrated circuits based around large vector-matrix multiplier memories.

引用

页数：24

共 81 条

[41] ImageNet Classification with Deep Convolutional Neural Networks [J].

Krizhevsky, Alex ;

Sutskever, Ilya ;

Hinton, Geoffrey E. .

COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90

[42] AN ITERATION METHOD FOR THE SOLUTION OF THE EIGENVALUE PROBLEM OF LINEAR DIFFERENTIAL AND INTEGRAL OPERATORS [J].

LANCZOS, C .

JOURNAL OF RESEARCH OF THE NATIONAL BUREAU OF STANDARDS, 1950, 45 (04) :255-282

[43]

Li CL, 2016, JMLR WORKSH CONF PRO, V51, P473

[44]

Li M, 2014, ADV NEUR IN, V27

[45]

Li Y, 2018, 2018 IEEE SYMPOSIUM ON VLSI TECHNOLOGY, P25, DOI 10.1109/VLSIT.2018.8510648

[46] Three-dimensional memristor circuits as complex neural networks [J].

Lin, Peng ;

Li, Can ;

Wang, Zhongrui ;

Li, Yunning ;

Jiang, Hao ;

Song, Wenhao ;

Rao, Mingyi ;

Zhuo, Ye ;

Upadhyay, Navnidhi K. ;

Barnell, Mark ;

Wu, Qing ;

Yang, J. Joshua ;

Xia, Qiangfei .

NATURE ELECTRONICS, 2020, 3 (04) :225-232

[47]

Lin Yujun, 2017, ICLR 2018 P

[48] FFT-based Gradient Sparsification for the Distributed Training of Deep Neural Networks [J].

Wang, Linnan ;

Wu, Wei ;

Zhang, Junyu ;

Liu, Hang ;

Bosilca, George ;

Herlihy, Maurice ;

Fonseca, Rodrigo .

PROCEEDINGS OF THE 29TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2020, 2020, :113-124

[49] ON THE LIMITED MEMORY BFGS METHOD FOR LARGE-SCALE OPTIMIZATION [J].

LIU, DC ;

NOCEDAL, J .

MATHEMATICAL PROGRAMMING, 1989, 45 (03) :503-528

[50] Defect Analysis and Parallel Testing for 3D Hybrid CMOS-Memristor Memory [J].

Liu, Peng ;

You, Zhiqiang ;

Wu, Jigang ;

Elimu, Michael ;

Wang, Weizheng ;

Cai, Shuo ;

Han, Yinhe .

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2021, 9 (02) :745-758

← 1 2 3 4 5 6 7 8 9 →