ROBUST MULTI-CHANNEL SPEECH RECOGNITION USING FREQUENCY ALIGNED NETWORK

被引：0

作者：

Park, Taejin ^{[1
]}

Kumatani, Kenichi ^{[2
]}

Wu, Minhua ^{[2
]}

Sundaram, Shiva ^{[2
]}

机构：

[1] Univ Southern Calif USC, Los Angeles, CA 90007 USA

[2] Amazon Inc, Sunnyvale, CA USA

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

关键词：

multi-channel acoustic modeling; beamforming; microphone arrays; automatic speech recognition;

D O I：

10.1109/icassp40776.2020.9053940

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Conventional speech enhancement technique such as beamforming has known benefits for far-field speech recognition. Our own work in frequency-domain multi-channel acoustic modeling has shown additional improvements by training a spatial filtering layer jointly within an acoustic model. In this paper, we further develop this idea and use frequency aligned network for robust multi-channel automatic speech recognition (ASR). Unlike an affine layer in the frequency domain, the proposed frequency aligned component prevents one frequency bin influencing other frequency bins. We show that this modification not only reduces the number of parameters in the model but also significantly and improves the ASR performance. We investigate effects of frequency aligned network through ASR experiments on the real-world far-field data where users are interacting with an ASR system in uncontrolled acoustic environments. We show that our multi-channel acoustic model with a frequency aligned network shows up to 18% relative reduction in word error rate.

引用

页码：6859 / 6863

页数：5

共 50 条

[1] Multi-Channel Feature Adaptation for Robust Speech Recognition
Zhang, Zhaofeng
Xiao, Xiong
Wang, Longbiao
Dang, Jianwu
Iwahashi, Masahiro
Chng, Eng Siong
Li, Haizhou
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[2] Robust Speech Recognition Using Feature-domain Multi-channel Bayesian Estimators
Principi, Emanuele
Rotili, Rudy
Cifani, Simone
Marinelli, Lorenzo
Squartini, Stefano
Piazza, Francesco
2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, 2010, : 2670 - 2673
[3] MULTI-CHANNEL OVERLAPPED SPEECH RECOGNITION WITH LOCATION GUIDED SPEECH EXTRACTION NETWORK
Chen, Zhuo
Xiao, Xiong
Yoshioka, Takuya
Erdogan, Hakan
Li, Jinyu
Gong, Yifan
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 558 - 565
[4] Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement
Taherian, Hassan
Wang, Zhong-Qiu
Chang, Jorge
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1293 - 1302
[5] Robust automatic speech recognition using a multi-channel signal separation front-end
Yen, KC
Zhao, YX
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1337 - 1340
[6] A unified network for multi-speaker speech recognition with multi-channel recordings
Liu, Conggui
Inoue, Nakamasa
Shinoda, Koichi
2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1304 - 1307
[7] FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION
Wu Minhua
Kumatani, Kenichi
Sundaram, Shiva
Strom, Nikko
Hoffmeister, Bjorn
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6640 - 6644
[8] Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-channel Speech Recognition
Li, Guanjun
Liang, Shan
Nie, Shuai
Liu, Wenju
Yang, Zhanlei
Xiao, Longshuai
INTERSPEECH 2020, 2020, : 51 - 55
[9] Robust speech recognition with multi-channel codebook dependent cepstral normalization (MCDCN)
Deligne, S
Gopinath, R
ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 151 - 154
[10] Environmental robust speech and speaker recognition through multi-channel histogram equalization
Squartini, Stefano
Principi, Emanuele
Rotili, Rudy
Piazza, Francesco
NEUROCOMPUTING, 2012, 78 (01) : 111 - 120

← 1 2 3 4 5 →