MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING

被引：0

作者：

Wiesler, Simon ^{[1
]}

Richard, Alexander ^{[1
]}

Schlueter, Ralf ^{[1
]}

Ney, Hermann ^{[1
]}

机构：

[1] Rhein Westfal TH Aachen, Dept Comp Sci, Aachen, Germany

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

deep learning; optimization; speech recognition; LVCSR;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.

引用

页数：5

共 50 条

[21] Large-scale underwater fish recognition via deep adversarial learning
Zhixue Zhang
Xiujuan Du
Long Jin
Shuqiao Wang
Lijuan Wang
Xiuxiu Liu
Knowledge and Information Systems, 2022, 64 : 353 - 379
[22] Large-scale Classification of 12-lead ECG with Deep Learning
Chen, Yu-Jhen
Liu, Chien-Liang
Tseng, Vincent S.
Hu, Yu-Feng
Chen, Shih-Ann
2019 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL & HEALTH INFORMATICS (BHI), 2019,
[23] Deep learning technique for fast inference of large-scale riverine bathymetry
Ghorbanidehno, Hojat
Lee, Jonghyun
Farthing, Matthew
Hesser, Tyler
Darve, Eric F.
Kitanidis, Peter K.
ADVANCES IN WATER RESOURCES, 2021, 147
[24] AN ADAPTIVE FILTER FOR DEEP LEARNING NETWORKS ON LARGE-SCALE POINT CLOUD
Zhao, Wang
Yi, Ran
Liu, Yong-Jin
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1620 - 1624
[25] Asynchronous I/O Strategy for Large-Scale Deep Learning Applications
Lee, Sunwoo
Kang, Qiao
Wang, Kewei
Balewski, Jan
Sim, Alex
Agrawal, Ankit
Choudhary, Alok
Nugent, Peter
Wu, Kesheng
Liao, Wei-keng
2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), 2021, : 322 - 331
[26] Deep Learning Systems: Algorithms, Compilers, and Processors for Large-Scale Production
Rodriguez A.
Synthesis Lectures on Computer Architecture, 2021, 15 (04): : 1 - 265
[27] Large-scale Face Clustering Method Research Based on Deep Learning
Wen, Zixin
2021 3RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING, BIG DATA AND BUSINESS INTELLIGENCE (MLBDBI 2021), 2021, : 731 - 734
[28] Large-scale underwater fish recognition via deep adversarial learning
Zhang, Zhixue
Du, Xiujuan
Jin, Long
Wang, Shuqiao
Wang, Lijuan
Liu, Xiuxiu
KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (02) : 353 - 379
[29] QUANTIZABLE DEEP REPRESENTATION LEARNING WITH GRADIENT SNAPPING LAYER FOR LARGE SCALE SEARCH
Liu, Shicong
Lu, Hongtao
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 121 - 126
[30] Optimizing Resource Allocation in Cloud for Large-Scale Deep Learning Models in Natural Language Processing
Dhopavkar, Gauri
Welekar, Rashmi R.
Ingole, Piyush K.
Vaidya, Chandu
Wankhade, Shalini Vaibhav
Vasgi, Bharati P.
JOURNAL OF ELECTRICAL SYSTEMS, 2023, 19 (03) : 62 - 77

← 1 2 3 4 5 →