MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING

被引:0
|
作者
Wiesler, Simon [1 ]
Richard, Alexander [1 ]
Schlueter, Ralf [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Aachen, Germany
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
关键词
deep learning; optimization; speech recognition; LVCSR;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Large-scale underwater fish recognition via deep adversarial learning
    Zhixue Zhang
    Xiujuan Du
    Long Jin
    Shuqiao Wang
    Lijuan Wang
    Xiuxiu Liu
    Knowledge and Information Systems, 2022, 64 : 353 - 379
  • [22] Large-scale Classification of 12-lead ECG with Deep Learning
    Chen, Yu-Jhen
    Liu, Chien-Liang
    Tseng, Vincent S.
    Hu, Yu-Feng
    Chen, Shih-Ann
    2019 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL & HEALTH INFORMATICS (BHI), 2019,
  • [23] Deep learning technique for fast inference of large-scale riverine bathymetry
    Ghorbanidehno, Hojat
    Lee, Jonghyun
    Farthing, Matthew
    Hesser, Tyler
    Darve, Eric F.
    Kitanidis, Peter K.
    ADVANCES IN WATER RESOURCES, 2021, 147
  • [24] AN ADAPTIVE FILTER FOR DEEP LEARNING NETWORKS ON LARGE-SCALE POINT CLOUD
    Zhao, Wang
    Yi, Ran
    Liu, Yong-Jin
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1620 - 1624
  • [25] Asynchronous I/O Strategy for Large-Scale Deep Learning Applications
    Lee, Sunwoo
    Kang, Qiao
    Wang, Kewei
    Balewski, Jan
    Sim, Alex
    Agrawal, Ankit
    Choudhary, Alok
    Nugent, Peter
    Wu, Kesheng
    Liao, Wei-keng
    2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), 2021, : 322 - 331
  • [26] Deep Learning Systems: Algorithms, Compilers, and Processors for Large-Scale Production
    Rodriguez A.
    Synthesis Lectures on Computer Architecture, 2021, 15 (04): : 1 - 265
  • [27] Large-scale Face Clustering Method Research Based on Deep Learning
    Wen, Zixin
    2021 3RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING, BIG DATA AND BUSINESS INTELLIGENCE (MLBDBI 2021), 2021, : 731 - 734
  • [28] Large-scale underwater fish recognition via deep adversarial learning
    Zhang, Zhixue
    Du, Xiujuan
    Jin, Long
    Wang, Shuqiao
    Wang, Lijuan
    Liu, Xiuxiu
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (02) : 353 - 379
  • [29] QUANTIZABLE DEEP REPRESENTATION LEARNING WITH GRADIENT SNAPPING LAYER FOR LARGE SCALE SEARCH
    Liu, Shicong
    Lu, Hongtao
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 121 - 126
  • [30] Optimizing Resource Allocation in Cloud for Large-Scale Deep Learning Models in Natural Language Processing
    Dhopavkar, Gauri
    Welekar, Rashmi R.
    Ingole, Piyush K.
    Vaidya, Chandu
    Wankhade, Shalini Vaibhav
    Vasgi, Bharati P.
    JOURNAL OF ELECTRICAL SYSTEMS, 2023, 19 (03) : 62 - 77