MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING

被引：0

作者：

Wiesler, Simon ^{[1
]}

Richard, Alexander ^{[1
]}

Schlueter, Ralf ^{[1
]}

Ney, Hermann ^{[1
]}

机构：

[1] Rhein Westfal TH Aachen, Dept Comp Sci, Aachen, Germany

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

deep learning; optimization; speech recognition; LVCSR;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.

引用

页数：5

共 50 条

[1] The Powerball Method With Biased Stochastic Gradient Estimation for Large-Scale Learning Systems
Yang, Zhuang
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
[2] Value function gradient learning for large-scale multistage stochastic programming problems
Lee, Jinkyu
Bae, Sanghyeon
Kim, Woo Chang
Lee, Yongjae
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 308 (01) : 321 - 335
[3] Designing Reconfigurable Large-Scale Deep Learning Systems Using Stochastic Computing
Ren, Ao
Li, Zhe
Wang, Yanzhi
Qiu, Qinru
Yuan, Bo
2016 IEEE INTERNATIONAL CONFERENCE ON REBOOTING COMPUTING (ICRC), 2016,
[4] Large-scale Deep Learning at Baidu
Yu, Kai
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 2211 - 2211
[5] Straggler-Aware Gradient Aggregation for Large-Scale Distributed Deep Learning System
Li, Yijun
Huang, Jiawei
Li, Zhaoyi
Liu, Jingling
Zhou, Shengwen
Zhang, Tao
Jiang, Wanchun
Wang, Jianxin
IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (06) : 4917 - 4930
[6] Large-Scale Deep Learning for Building Intelligent Computer Systems
Dean, Jeff
PROCEEDINGS OF THE NINTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'16), 2016, : 1 - 1
[7] Large-scale Pollen Recognition with Deep Learning
de Geus, Andre R.
Barcelos, Celia A. Z.
Batista, Marcos A.
da Silva, Sergio F.
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[8] Deep Learning on Large-scale Muticore Clusters
Sakiyama, Kazumasa
Kato, Shinpei
Ishikawa, Yutaka
Hori, Atsushi
Monrroy, Abraham
2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 314 - 321
[9] HammingMesh: A Network Topology for Large-Scale Deep Learning
Hoefler, Torsten
Bonato, Tommaso
De Sensi, Daniele
Di Girolamo, Salvatore
Li, Shigang
Heddes, Marco
Belk, Jon
Goel, Deepak
Castro, Miguel
Scott, Steve
SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
[10] Rich Punctuations Prediction Using Large-scale Deep Learning
Wu, Xueyang
Zhu, Su
Wu, Yue
Yu, Kai
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,

← 1 2 3 4 5 →