ON TIME-FREQUENCY MASK ESTIMATION FOR MVDR BEAMFORMING WITH APPLICATION IN ROBUST SPEECH RECOGNITION

被引：0

作者：

Xiao, Xiong ^{[1
]}

Zhao, Shengkui ^{[2
]}

Jones, Douglas L. ^{[2
]}

Chng, Eng Siong ^{[1
,3
]}

Li, Haizhou ^{[1
,3
,4
,5
]}

机构：

[1] Nanyang Technol Univ, Temasek Labs, Singapore, Singapore

[2] Adv Digital Sci Ctr, Singapore, Singapore

[3] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore

[4] Natl Univ Singapore, Dept ECE, Singapore, Singapore

[5] ASTAR, Inst Infocomm Res, Singapore, Singapore

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

关键词：

beamforming; robust speech recognition; timefrequency mask; neural networks; long short-term memory;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Acoustic beamforming has played a key role in the robust automatic speech recognition (ASR) applications. Accurate estimates of the speech and noise spatial covariance matrices (SCM) are crucial for successfully applying the minimum variance distortionless response (MVDR) beamforming. Reliable estimation of time-frequency (TF) masks can improve the estimation of the SCMs and significantly improve the performance of the MVDR beamforming in ASR tasks. In this paper, we focus on the TF mask estimation using recurrent neural networks (RNN). Specifically, our methods include training the RNN to estimate the speech and noise masks independently, training the RNN to minimize the ASR cost function directly, and performing multiple passes to iteratively improve the mask estimation. The proposed methods are evaluated individually and overally on the CHiME-4 challenge. The results show that the proposed methods improve the ASR performance individually and also work complementarily. The overall performance achieves a word error rate of 8.9% with 6-microphone configuration, which is much better than 12.0% achieved with the state-of-the-art MVDR implementation.

引用

页码：3246 / 3250

页数：5

共 50 条

[1] ROBUST MVDR BEAMFORMING USING TIME-FREQUENCY MASKS FOR ONLINE/OFFLINE ASR IN NOISE
Higuchi, Takuya
Ito, Nobutaka
Yoshioka, Takuya
Nakatani, Tomohiro
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5210 - 5214
[2] TIME-FREQUENCY CONVOLUTIONAL NETWORKS FOR ROBUST SPEECH RECOGNITION
Mitra, Vikramjit
Franco, Horacio
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 317 - 323
[3] Binary and ratio time-frequency masks for robust speech recognition
Srinivasan, Soundararajan
Roman, Nicoleta
Wang, DeLiang
SPEECH COMMUNICATION, 2006, 48 (11) : 1486 - 1501
[4] Time-Frequency Masking For Large Scale Robust Speech Recognition
Wang, Yuxuan
Misra, Ananya
Chine, Kean K.
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2469 - 2473
[5] INTEGRATING DNN-BASED AND SPATIAL CLUSTERING-BASED MASK ESTIMATION FOR ROBUST MVDR BEAMFORMING
Nakatani, Tomohiro
To, Nobutaka
Higuchi, Takuya
Araki, Shoko
Kinoshita, Keisuke
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 286 - 290
[6] Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition
Soni, Meet
Panda, Ashish
INTERSPEECH 2019, 2019, : 426 - 430
[7] NEURAL NETWORK BASED TIME-FREQUENCY MASKING AND STEERING VECTOR ESTIMATION FOR TWO-CHANNEL MVDR BEAMFORMING
Liu, Yuzhou
Ganguly, Anshuman
Kamath, Krishna
Kristjansson, Trausti
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6717 - 6721
[8] Robust Beam forming for Speech Recognition Using DNN-Based Time-Frequency Masks Estimation
Jiang, Wenbin
Wen, Fei
Liu, Peilin
IEEE ACCESS, 2018, 6 : 52385 - 52392
[9] Label-Driven Time-Frequency Masking for Robust Speech Command Recognition
Soni, Meet
Sheikh, Imran
Kopparapu, Sunil Kumar
TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 341 - 351
[10] Robust Automatic Speech Recognition System Based on Using Adaptive Time-Frequency Masking
Gouda, Ahmed Mostafa
Tamazin, Mohamed
Khedr, Mohamed
PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 181 - 186

← 1 2 3 4 5 →