Convolutional Neural Networks for Distant Speech Recognition

被引:180
作者
Swietojanski, Pawel [1 ]
Ghoshal, Arnab [2 ]
Renals, Steve [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland
[2] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
AMI corpus; convolutional neural networks; deep neural networks; distant speech recognition; meetings;
D O I
10.1109/LSP.2014.2325781
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We investigate convolutional neural networks (CNNs) for large vocabulary distant speech recognition, trained using speech recorded from a single distant microphone (SDM) and multiple distant microphones (MDM). In the MDM case we explore a beamformed signal input representation compared with the direct use of multiple acoustic channels as a parallel input to the CNN. We have explored different weight sharing approaches, and propose a channel-wise convolution with two-way pooling. Our experiments, using the AMI meeting corpus, found that CNNs improve the word error rate (WER) by 6.5% relative compared to conventional deep neural network (DNN) models and 15.7% over a discriminatively trained Gaussian mixture model (GMM) baseline. For cross-channel CNN training, the WER improves by 3.5% relative over the comparable DNN structure. Compared with the best beamformed GMM system, cross-channel convolution reduces the WER by 9.7% relative, and matches the accuracy of a beamformed DNN.
引用
收藏
页码:1120 / 1124
页数:5
相关论文
共 50 条
[11]   Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition [J].
Qian, Yanmin ;
Bi, Mengxiao ;
Tan, Tian ;
Yu, Kai .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (12) :2263-2276
[12]   Convolutional Maxout Neural Networks for Low-Resource Speech Recognition [J].
Cai, Meng ;
Shi, Yongzhe ;
Kang, Jian ;
Liu, Jia ;
Su, Tengrong .
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, :133-+
[13]   Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets [J].
Zielonka, Marta ;
Piastowski, Artur ;
Czyzewski, Andrzej ;
Nadachowski, Pawel ;
Operlejn, Maksymilian ;
Kaczor, Kamil .
ELECTRONICS, 2022, 11 (22)
[14]   Speech Recognition Using Convolutional Neural Networks on Small Training Sets [J].
Poliyev, A. V. ;
Korsun, O. N. .
2019 WORKSHOP ON MATERIALS AND ENGINEERING IN AERONAUTICS, 2020, 714
[15]   CONVOLUTIONAL NEURAL NETWORK TECHNIQUES FOR SPEECH EMOTION RECOGNITION [J].
Parthasarathy, Srinivas ;
Tashev, Ivan .
2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, :121-125
[16]   CONVOLUTIONAL NEURAL NETWORKS-BASED CONTINUOUS SPEECH RECOGNITION USING RAW SPEECH SIGNAL [J].
Palaz, Dimitri ;
Magimai-Doss, Mathew ;
Collobert, Ronan .
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :4295-4299
[17]   Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks [J].
Zhang, Ying ;
Pezeshki, Mohammad ;
Brakel, Philemon ;
Zhang, Saizheng ;
Laurent, Cesar ;
Bengio, Yoshua ;
Courville, Aaron .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :410-414
[18]   Performance prediction of automatic speech recognition systems using convolutional neural networks [J].
Elloumi, Zied ;
Lecouteux, Benjamin ;
Galibert, Olivier ;
Besacier, Laurent .
TRAITEMENT AUTOMATIQUE DES LANGUES, 2018, 59 (02) :49-76
[19]   Dysarthric Speech Recognition Using Variational Mode Decomposition and Convolutional Neural Networks [J].
R. Rajeswari ;
T. Devi ;
S. Shalini .
Wireless Personal Communications, 2022, 122 :293-307
[20]   FEATURE EXTRACTION USING MULTIMODAL CONVOLUTIONAL NEURAL NETWORKS FOR VISUAL SPEECH RECOGNITION [J].
Tatulli, Eric ;
Hueber, Thomas .
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, :2971-2975