MEAN-NORMALIZED STOCHASTIC GRADIENT FOR LARGE-SCALE DEEP LEARNING

被引:0
|
作者
Wiesler, Simon [1 ]
Richard, Alexander [1 ]
Schlueter, Ralf [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Aachen, Germany
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
关键词
deep learning; optimization; speech recognition; LVCSR;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Efficient Learning of Fuzzy Logic Systems for Large-Scale Data Using Deep Learning
    Koklu, Ata
    Guven, Yusuf
    Kumbasar, Tufan
    INTELLIGENT AND FUZZY SYSTEMS, INFUS 2024 CONFERENCE, VOL 1, 2024, 1088 : 406 - 413
  • [32] Deep Reinforcement Learning for Network Service Recovery in Large-scale Failures
    Akashi, Kazuaki
    Fukuda, Nobukazu
    Kanai, Shunsuke
    Tayama, Kenichi
    2023 19TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT, CNSM, 2023,
  • [33] Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey
    Giang Nguyen
    Stefan Dlugolinsky
    Martin Bobák
    Viet Tran
    Álvaro López García
    Ignacio Heredia
    Peter Malík
    Ladislav Hluchý
    Artificial Intelligence Review, 2019, 52 : 77 - 124
  • [34] Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey
    Nguyen, Giang
    Dlugolinsky, Stefan
    Bobak, Martin
    Viet Tran
    Lopez Garcia, Alvaro
    Heredia, Ignacio
    Malik, Peter
    Hluchy, Ladislav
    ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (01) : 77 - 124
  • [35] Stochastic Augmented Projected Gradient Methods for the Large-Scale Precoding Matrix Indicator Selection Problem
    Zhang, Jiaqi
    Jin, Zeyu
    Jiang, Bo
    Wen, Zaiwen
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2022, 21 (11) : 9553 - 9565
  • [36] PARALLEL STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION METHOD FOR LARGE-SCALE DICTIONARY LEARNING
    Koppel, Alec
    Mokhtari, Aryan
    Ribeiro, Alejandro
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2771 - 2775
  • [37] A novel Deep Learning Scheme for Cooperative Task Allocation of Large-scale UAVs
    Tang, Yifan
    Dou, Liqian
    Zhang, Ruilong
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 1866 - 1871
  • [38] Hybrid Deep Learning Ensemble Model for Improved Large-Scale Car Recognition
    Verma, Abhishek
    Liu, Yu
    2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,
  • [39] Large-scale Exploration of Neuronal Morphologies Using Deep Learning and Augmented Reality
    Zhongyu Li
    Erik Butler
    Kang Li
    Aidong Lu
    Shuiwang Ji
    Shaoting Zhang
    Neuroinformatics, 2018, 16 : 339 - 349
  • [40] Deep Learning-Based Large-Scale Automatic Satellite Crosswalk Classification
    Berriel, Rodrigo F.
    Lopes, Andre Teixeira
    de Souza, Alberto F.
    Oliveira-Santos, Thiago
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2017, 14 (09) : 1513 - 1517