HYBRID ACOUSTIC MODELS FOR DISTANT AND MULTICHANNEL LARGE VOCABULARY SPEECH RECOGNITION

被引：0

作者：

Swietojanski, Pawel ^{[1
]}

Ghoshal, Arnab ^{[1
]}

Renals, Steve ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland

来源：

2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU) | 2013年

基金：

英国工程与自然科学研究理事会;

关键词：

Distant Speech Recognition; Deep Neural Networks; Microphone Arrays; Beamforming; Meeting recognition;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We investigate the application of deep neural network (DNN)-hidden Markov model (HMM) hybrid acoustic models for far-field speech recognition of meetings recorded using microphone arrays. We show that the hybrid models achieve significantly better accuracy than conventional systems based on Gaussian mixture models (GMMs). We observe up to 8% absolute word error rate (WER) reduction from a discriminatively trained GMM baseline when using a single distant microphone, and between 4-6% absolute WER reduction when using beamforming on various combinations of array channels. By training the networks on audio from multiple channels, we find the networks can recover significant part of accuracy difference between the single distant microphone and beamformed configurations. Finally, we show that the accuracy of a network recognising speech from a single distant microphone can approach that of a multi-microphone setup by training with data from other microphones.

引用

页码：285 / 290

页数：6

共 50 条

[41] Attention-based LSTM with Multi-task Learning for Distant Speech Recognition
Zhang, Yu
Zhang, Pengyuan
Yan, Yonghong
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3857 - 3861
[42] ON TRAINING THE RECURRENT NEURAL NETWORK ENCODER-DECODER FOR LARGE VOCABULARY END-TO-END SPEECH RECOGNITION
Lu, Liang
Zhang, Xingxing
Renals, Steve
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5060 - 5064
[43] Channel Selection for Distant Speech Recognition Exploiting Cepstral Distance
Guerrero, Cristina
Tryfou, Georgina
Omologo, Maurizio
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1986 - 1990
[44] Neural Blind Source Separation and Diarization for Distant Speech Recognition
Bando, Yoshiaki
Nakamura, Tomohiko
Watanabe, Shinji
INTERSPEECH 2024, 2024, : 722 - 726
[45] Experimental Study on Dereverberation and Noise Reduction for Distant Speech Recognition
Fu, Zhong-Hua
Xie, Lei
Lv, Hang
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 393 - 397
[46] Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition
Li, Bo
Sainath, Tara N.
Weiss, Ron J.
Wilson, Kevin W.
Bacchiani, Michiel
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1976 - 1980
[47] Cepstral distance based channel selection for distant speech recognition
Flores, Cristina Guerrero
Tryfou, Georgina
Omologo, Maurizio
COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 314 - 332
[48] Distilling Knowledge for Distant Speech Recognition via Parallel Data
Yi, Jiangyan
Tao, Jianhua
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 170 - 175
[49] Contaminated speech training methods for robust DNN-HMM distant speech recognition
Ravanelli, Mirco
Omologo, Maurizio
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 756 - 760
[50] CONTEXT DEPENDENT STATE TYING FOR SPEECH RECOGNITION USING DEEP NEURAL NETWORK ACOUSTIC MODELS
Bacchiani, Michiel
Rybach, David
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,

← 1 2 3 4 5 →