MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING

被引：0

作者：

Wiesler, Simon ^{[1
]}

Richard, Alexander ^{[1
]}

Schlueter, Ralf ^{[1
]}

Ney, Hermann ^{[1
]}

机构：

[1] Rhein Westfal TH Aachen, Dept Comp Sci, Aachen, Germany

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

deep learning; optimization; speech recognition; LVCSR;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.

引用

页数：5

共 50 条

[41] Segmentation in large-scale cellular electron microscopy with deep learning: A literature survey
Aswath, Anusha
Alsahaf, Ahmad
Giepmans, Ben N. G.
Azzopardi, George
MEDICAL IMAGE ANALYSIS, 2023, 89
[42] Non-intrusive model reduction of large-scale, nonlinear dynamical systems using deep learning
Gao, Han
Wang, Jian-Xun
Zahr, Matthew J.
PHYSICA D-NONLINEAR PHENOMENA, 2020, 412
[43] Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
Choi, Hyeonseong
Lee, Jaehwan
APPLIED SCIENCES-BASEL, 2021, 11 (21):
[44] Implementation of a Large-Scale Image Curation Workflow Using Deep Learning Framework
Domalpally, Amitha
Slater, Robert
Barrett, Nancy
Voland, Rick
Balaji, Rohit
Heathcote, Jennifer
Channa, Roomasa
Blodi, Barbara
OPHTHALMOLOGY SCIENCE, 2022, 2 (04):
[45] A Data-Centric Approach for Analyzing Large-Scale Deep Learning Applications
Vineet, S. Sai
Joseph, Natasha Meena
Korgaonkar, Kunal
Paul, Arnab K.
PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING, ICDCN 2023, 2023, : 282 - 283
[46] Deep Learning loss model for large-scale low voltage smart grids
Velasco, Jose Angel
Amaris, Hortensia
Alonso, Monica
INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2020, 121
[47] swFLOW: A large-scale distributed framework for deep learning on Sunway TaihuLight supercomputer
Li, Mingfan
Lin, Han
Chen, Junshi
Diaz, Jose Monsalve
Xiao, Qian
Lin, Rongfen
Wang, Fei
Gao, Guang R.
An, Hong
INFORMATION SCIENCES, 2021, 570 (570) : 831 - 847
[48] Multi-task deep learning for large-scale buildings energy management
Wang, Rui
Rayhana, Rakiba
Gholami, Majid
Herrera, Omar E.
Liu, Zheng
Merida, Walter
ENERGY AND BUILDINGS, 2024, 307
[49] Optimizing coagulant dosage using deep learning models with large-scale data
Kim J.
Hua C.
Kim K.
Lin S.
Oh G.
Park M.-H.
Kang S.
Chemosphere, 2024, 350
[50] NetSentry: A deep learning approach to detecting incipient large-scale network attacks
Liu, Haoyu
Patras, Paul
COMPUTER COMMUNICATIONS, 2022, 191 : 119 - 132

← 1 2 3 4 5 →