Improved Acoustic Feature Combination for LVCSR by Neural Networks

被引:0
|
作者
Plahl, Christian [1 ]
Schlueter, Ralf [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Lehrstuhl Informat 6, Dept Comp Sci, Aachen, Germany
来源
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年
关键词
feature extraction; multi-layer neural network; speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates the combination of different acoustic features. Several methods to combine these features such as concatenation or LDA are well known. Even though LDA improves the system, feature combination by LDA has been shown to be suboptimal. We introduce a new method based on neural networks. The posterior estimates derived from the NN lead to a significant improvement and achieve a 6% relative better word error rate (WER). Results are also compared to system combination. While system combination has been reported to outperform all other combination techniques, in this work the proposed NN-based combination outperforms system combination. We achieve a 2% relative better WER, resulting in an improvement of 7% relative to the baseline system. In addition to giving better recognition performance w.r.t. WER, NN-based combination reduces both, training and testing complexity. Overall, we use a single set of acoustic models, together with the training of the NN.
引用
收藏
页码:1244 / 1247
页数:4
相关论文
共 50 条
  • [1] FEATURE COMBINATION AND STACKING OF RECURRENT AND NON-RECURRENT NEURAL NETWORKS FOR LVCSR
    Plahl, Christian
    Kozielski, Michael
    Schlueter, Ralf
    Ney, Hermann
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6714 - 6718
  • [2] Hierarchical Neural Networks Feature Extraction for LVCSR system
    Valente, Fabio
    Vepa, Jithendra
    Plahl, Christian
    Gollan, Christian
    Hertmansky, Hynek
    Schlueter, Ralf
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 265 - +
  • [3] Convolutional Neural Networks for Acoustic Modeling of Raw Time Signal in LVCSR
    Golik, Pavel
    Tueske, Zoltan
    Schlueter, Ralf
    Ney, Hermann
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 26 - 30
  • [4] Improving Russian LVCSR Using Deep Neural Networks for Acoustic and Language Modeling
    Kipyatkova, Irina
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 291 - 300
  • [5] Acoustic Modeling with Deep Neural Networks Using Raw Time Signal for LVCSR
    Tueske, Zoltan
    Golik, Pavel
    Schluter, Ralf
    Ney, Hermann
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 890 - 894
  • [6] DEEP CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR
    Sainath, Tara N.
    Mohamed, Abdel-rahman
    Kingsbury, Brian
    Ramabhadran, Bhuvana
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8614 - 8618
  • [7] ON THE COMPRESSION OF RECURRENT NEURAL NETWORKS WITH AN APPLICATION TO LVCSR ACOUSTIC MODELING FOR EMBEDDED SPEECH RECOGNITION
    Prabhavalkar, Rohit
    Alsharif, Ouais
    Bruguier, Antoine
    McGraw, Ian
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5970 - 5974
  • [8] IMPROVEMENTS TO DEEP CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR
    Sainath, Tara N.
    Kingsbury, Brian
    Mohamed, Abdel-rahman
    Dahl, George E.
    Saon, George
    Soltau, Hagen
    Beran, Tomas
    Aravkin, Aleksandr Y.
    Ramabhadran, Bhuvana
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 315 - 320
  • [9] Feature combination mixup: novel mixup method using feature combination for neural networks
    Tomoumi Takase
    Neural Computing and Applications, 2023, 35 : 12763 - 12774
  • [10] Improved feature processing for Deep Neural Networks
    Rath, Shakti P.
    Povey, Daniel
    Vesely, Karel
    Cernocky, Jan
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 109 - 113