MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING

被引:0
|
作者
Wiesler, Simon [1 ]
Richard, Alexander [1 ]
Schlueter, Ralf [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Aachen, Germany
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
关键词
deep learning; optimization; speech recognition; LVCSR;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Segmentation in large-scale cellular electron microscopy with deep learning: A literature survey
    Aswath, Anusha
    Alsahaf, Ahmad
    Giepmans, Ben N. G.
    Azzopardi, George
    MEDICAL IMAGE ANALYSIS, 2023, 89
  • [42] Non-intrusive model reduction of large-scale, nonlinear dynamical systems using deep learning
    Gao, Han
    Wang, Jian-Xun
    Zahr, Matthew J.
    PHYSICA D-NONLINEAR PHENOMENA, 2020, 412
  • [43] Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
    Choi, Hyeonseong
    Lee, Jaehwan
    APPLIED SCIENCES-BASEL, 2021, 11 (21):
  • [44] Implementation of a Large-Scale Image Curation Workflow Using Deep Learning Framework
    Domalpally, Amitha
    Slater, Robert
    Barrett, Nancy
    Voland, Rick
    Balaji, Rohit
    Heathcote, Jennifer
    Channa, Roomasa
    Blodi, Barbara
    OPHTHALMOLOGY SCIENCE, 2022, 2 (04):
  • [45] A Data-Centric Approach for Analyzing Large-Scale Deep Learning Applications
    Vineet, S. Sai
    Joseph, Natasha Meena
    Korgaonkar, Kunal
    Paul, Arnab K.
    PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING, ICDCN 2023, 2023, : 282 - 283
  • [46] Deep Learning loss model for large-scale low voltage smart grids
    Velasco, Jose Angel
    Amaris, Hortensia
    Alonso, Monica
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2020, 121
  • [47] swFLOW: A large-scale distributed framework for deep learning on Sunway TaihuLight supercomputer
    Li, Mingfan
    Lin, Han
    Chen, Junshi
    Diaz, Jose Monsalve
    Xiao, Qian
    Lin, Rongfen
    Wang, Fei
    Gao, Guang R.
    An, Hong
    INFORMATION SCIENCES, 2021, 570 (570) : 831 - 847
  • [48] Multi-task deep learning for large-scale buildings energy management
    Wang, Rui
    Rayhana, Rakiba
    Gholami, Majid
    Herrera, Omar E.
    Liu, Zheng
    Merida, Walter
    ENERGY AND BUILDINGS, 2024, 307
  • [49] Optimizing coagulant dosage using deep learning models with large-scale data
    Kim J.
    Hua C.
    Kim K.
    Lin S.
    Oh G.
    Park M.-H.
    Kang S.
    Chemosphere, 2024, 350
  • [50] NetSentry: A deep learning approach to detecting incipient large-scale network attacks
    Liu, Haoyu
    Patras, Paul
    COMPUTER COMMUNICATIONS, 2022, 191 : 119 - 132