MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING

被引:0
|
作者
Wiesler, Simon [1 ]
Richard, Alexander [1 ]
Schlueter, Ralf [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Aachen, Germany
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
关键词
deep learning; optimization; speech recognition; LVCSR;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] The Powerball Method With Biased Stochastic Gradient Estimation for Large-Scale Learning Systems
    Yang, Zhuang
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
  • [2] Value function gradient learning for large-scale multistage stochastic programming problems
    Lee, Jinkyu
    Bae, Sanghyeon
    Kim, Woo Chang
    Lee, Yongjae
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 308 (01) : 321 - 335
  • [3] Designing Reconfigurable Large-Scale Deep Learning Systems Using Stochastic Computing
    Ren, Ao
    Li, Zhe
    Wang, Yanzhi
    Qiu, Qinru
    Yuan, Bo
    2016 IEEE INTERNATIONAL CONFERENCE ON REBOOTING COMPUTING (ICRC), 2016,
  • [4] Large-scale Deep Learning at Baidu
    Yu, Kai
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 2211 - 2211
  • [5] Straggler-Aware Gradient Aggregation for Large-Scale Distributed Deep Learning System
    Li, Yijun
    Huang, Jiawei
    Li, Zhaoyi
    Liu, Jingling
    Zhou, Shengwen
    Zhang, Tao
    Jiang, Wanchun
    Wang, Jianxin
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (06) : 4917 - 4930
  • [6] Large-Scale Deep Learning for Building Intelligent Computer Systems
    Dean, Jeff
    PROCEEDINGS OF THE NINTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'16), 2016, : 1 - 1
  • [7] Large-scale Pollen Recognition with Deep Learning
    de Geus, Andre R.
    Barcelos, Celia A. Z.
    Batista, Marcos A.
    da Silva, Sergio F.
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [8] Deep Learning on Large-scale Muticore Clusters
    Sakiyama, Kazumasa
    Kato, Shinpei
    Ishikawa, Yutaka
    Hori, Atsushi
    Monrroy, Abraham
    2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 314 - 321
  • [9] HammingMesh: A Network Topology for Large-Scale Deep Learning
    Hoefler, Torsten
    Bonato, Tommaso
    De Sensi, Daniele
    Di Girolamo, Salvatore
    Li, Shigang
    Heddes, Marco
    Belk, Jon
    Goel, Deepak
    Castro, Miguel
    Scott, Steve
    SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
  • [10] Rich Punctuations Prediction Using Large-scale Deep Learning
    Wu, Xueyang
    Zhu, Su
    Wu, Yue
    Yu, Kai
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,