Improved Acoustic Feature Combination for LVCSR by Neural Networks

被引：0

作者：

Plahl, Christian ^{[1
]}

Schlueter, Ralf ^{[1
]}

Ney, Hermann ^{[1
]}

机构：

[1] Rhein Westfal TH Aachen, Lehrstuhl Informat 6, Dept Comp Sci, Aachen, Germany

来源：

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年

关键词：

feature extraction; multi-layer neural network; speech recognition;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper investigates the combination of different acoustic features. Several methods to combine these features such as concatenation or LDA are well known. Even though LDA improves the system, feature combination by LDA has been shown to be suboptimal. We introduce a new method based on neural networks. The posterior estimates derived from the NN lead to a significant improvement and achieve a 6% relative better word error rate (WER). Results are also compared to system combination. While system combination has been reported to outperform all other combination techniques, in this work the proposed NN-based combination outperforms system combination. We achieve a 2% relative better WER, resulting in an improvement of 7% relative to the baseline system. In addition to giving better recognition performance w.r.t. WER, NN-based combination reduces both, training and testing complexity. Overall, we use a single set of acoustic models, together with the training of the NN.

引用

页码：1244 / 1247

页数：4

共 50 条

[1] FEATURE COMBINATION AND STACKING OF RECURRENT AND NON-RECURRENT NEURAL NETWORKS FOR LVCSR
Plahl, Christian
Kozielski, Michael
Schlueter, Ralf
Ney, Hermann
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6714 - 6718
[2] Hierarchical Neural Networks Feature Extraction for LVCSR system
Valente, Fabio
Vepa, Jithendra
Plahl, Christian
Gollan, Christian
Hertmansky, Hynek
Schlueter, Ralf
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 265 - +
[3] Convolutional Neural Networks for Acoustic Modeling of Raw Time Signal in LVCSR
Golik, Pavel
Tueske, Zoltan
Schlueter, Ralf
Ney, Hermann
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 26 - 30
[4] Improving Russian LVCSR Using Deep Neural Networks for Acoustic and Language Modeling
Kipyatkova, Irina
SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 291 - 300
[5] Acoustic Modeling with Deep Neural Networks Using Raw Time Signal for LVCSR
Tueske, Zoltan
Golik, Pavel
Schluter, Ralf
Ney, Hermann
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 890 - 894
[6] DEEP CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR
Sainath, Tara N.
Mohamed, Abdel-rahman
Kingsbury, Brian
Ramabhadran, Bhuvana
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8614 - 8618
[7] ON THE COMPRESSION OF RECURRENT NEURAL NETWORKS WITH AN APPLICATION TO LVCSR ACOUSTIC MODELING FOR EMBEDDED SPEECH RECOGNITION
Prabhavalkar, Rohit
Alsharif, Ouais
Bruguier, Antoine
McGraw, Ian
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5970 - 5974
[8] IMPROVEMENTS TO DEEP CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR
Sainath, Tara N.
Kingsbury, Brian
Mohamed, Abdel-rahman
Dahl, George E.
Saon, George
Soltau, Hagen
Beran, Tomas
Aravkin, Aleksandr Y.
Ramabhadran, Bhuvana
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 315 - 320
[9] Feature combination mixup: novel mixup method using feature combination for neural networks
Tomoumi Takase
Neural Computing and Applications, 2023, 35 : 12763 - 12774
[10] Improved feature processing for Deep Neural Networks
Rath, Shakti P.
Povey, Daniel
Vesely, Karel
Cernocky, Jan
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 109 - 113

← 1 2 3 4 5 →