Convolutional Neural Networks for Distant Speech Recognition

被引：180

作者：

Swietojanski, Pawel ^{[1
]}

Ghoshal, Arnab ^{[2
]}

Renals, Steve ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland

[2] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland

来源：

IEEE SIGNAL PROCESSING LETTERS | 2014年 / 21卷 / 09期

基金：

英国工程与自然科学研究理事会;

关键词：

AMI corpus; convolutional neural networks; deep neural networks; distant speech recognition; meetings;

D O I：

10.1109/LSP.2014.2325781

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We investigate convolutional neural networks (CNNs) for large vocabulary distant speech recognition, trained using speech recorded from a single distant microphone (SDM) and multiple distant microphones (MDM). In the MDM case we explore a beamformed signal input representation compared with the direct use of multiple acoustic channels as a parallel input to the CNN. We have explored different weight sharing approaches, and propose a channel-wise convolution with two-way pooling. Our experiments, using the AMI meeting corpus, found that CNNs improve the word error rate (WER) by 6.5% relative compared to conventional deep neural network (DNN) models and 15.7% over a discriminatively trained Gaussian mixture model (GMM) baseline. For cross-channel CNN training, the WER improves by 3.5% relative over the comparable DNN structure. Compared with the best beamformed GMM system, cross-channel convolution reduces the WER by 9.7% relative, and matches the accuracy of a beamformed DNN.

引用

页码：1120 / 1124

页数：5

共 50 条

[11] Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition [J].

Qian, Yanmin ;

Bi, Mengxiao ;

Tan, Tian ;

Yu, Kai .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (12) :2263-2276

[12] Convolutional Maxout Neural Networks for Low-Resource Speech Recognition [J].

Cai, Meng ;

Shi, Yongzhe ;

Kang, Jian ;

Liu, Jia ;

Su, Tengrong .

2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, :133-+

[13] Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets [J].

Zielonka, Marta ;

Piastowski, Artur ;

Czyzewski, Andrzej ;

Nadachowski, Pawel ;

Operlejn, Maksymilian ;

Kaczor, Kamil .

ELECTRONICS, 2022, 11 (22)

[14] Speech Recognition Using Convolutional Neural Networks on Small Training Sets [J].

Poliyev, A. V. ;

Korsun, O. N. .

2019 WORKSHOP ON MATERIALS AND ENGINEERING IN AERONAUTICS, 2020, 714

[15] CONVOLUTIONAL NEURAL NETWORK TECHNIQUES FOR SPEECH EMOTION RECOGNITION [J].

Parthasarathy, Srinivas ;

Tashev, Ivan .

2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, :121-125

[16] CONVOLUTIONAL NEURAL NETWORKS-BASED CONTINUOUS SPEECH RECOGNITION USING RAW SPEECH SIGNAL [J].

Palaz, Dimitri ;

Magimai-Doss, Mathew ;

Collobert, Ronan .

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :4295-4299

[17] Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks [J].

Zhang, Ying ;

Pezeshki, Mohammad ;

Brakel, Philemon ;

Zhang, Saizheng ;

Laurent, Cesar ;

Bengio, Yoshua ;

Courville, Aaron .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :410-414

[18] Performance prediction of automatic speech recognition systems using convolutional neural networks [J].

Elloumi, Zied ;

Lecouteux, Benjamin ;

Galibert, Olivier ;

Besacier, Laurent .

TRAITEMENT AUTOMATIQUE DES LANGUES, 2018, 59 (02) :49-76

[19] Dysarthric Speech Recognition Using Variational Mode Decomposition and Convolutional Neural Networks [J].

R. Rajeswari ;

T. Devi ;

S. Shalini .

Wireless Personal Communications, 2022, 122 :293-307

[20] FEATURE EXTRACTION USING MULTIMODAL CONVOLUTIONAL NEURAL NETWORKS FOR VISUAL SPEECH RECOGNITION [J].

Tatulli, Eric ;

Hueber, Thomas .

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, :2971-2975

← 1 2 3 4 5 →