ROBUST MULTI-CHANNEL SPEECH RECOGNITION USING FREQUENCY ALIGNED NETWORK

被引:0
|
作者
Park, Taejin [1 ]
Kumatani, Kenichi [2 ]
Wu, Minhua [2 ]
Sundaram, Shiva [2 ]
机构
[1] Univ Southern Calif USC, Los Angeles, CA 90007 USA
[2] Amazon Inc, Sunnyvale, CA USA
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
关键词
multi-channel acoustic modeling; beamforming; microphone arrays; automatic speech recognition;
D O I
10.1109/icassp40776.2020.9053940
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Conventional speech enhancement technique such as beamforming has known benefits for far-field speech recognition. Our own work in frequency-domain multi-channel acoustic modeling has shown additional improvements by training a spatial filtering layer jointly within an acoustic model. In this paper, we further develop this idea and use frequency aligned network for robust multi-channel automatic speech recognition (ASR). Unlike an affine layer in the frequency domain, the proposed frequency aligned component prevents one frequency bin influencing other frequency bins. We show that this modification not only reduces the number of parameters in the model but also significantly and improves the ASR performance. We investigate effects of frequency aligned network through ASR experiments on the real-world far-field data where users are interacting with an ASR system in uncontrolled acoustic environments. We show that our multi-channel acoustic model with a frequency aligned network shows up to 18% relative reduction in word error rate.
引用
收藏
页码:6859 / 6863
页数:5
相关论文
共 50 条
  • [1] Multi-Channel Feature Adaptation for Robust Speech Recognition
    Zhang, Zhaofeng
    Xiao, Xiong
    Wang, Longbiao
    Dang, Jianwu
    Iwahashi, Masahiro
    Chng, Eng Siong
    Li, Haizhou
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [2] Robust Speech Recognition Using Feature-domain Multi-channel Bayesian Estimators
    Principi, Emanuele
    Rotili, Rudy
    Cifani, Simone
    Marinelli, Lorenzo
    Squartini, Stefano
    Piazza, Francesco
    2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, 2010, : 2670 - 2673
  • [3] MULTI-CHANNEL OVERLAPPED SPEECH RECOGNITION WITH LOCATION GUIDED SPEECH EXTRACTION NETWORK
    Chen, Zhuo
    Xiao, Xiong
    Yoshioka, Takuya
    Erdogan, Hakan
    Li, Jinyu
    Gong, Yifan
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 558 - 565
  • [4] Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement
    Taherian, Hassan
    Wang, Zhong-Qiu
    Chang, Jorge
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1293 - 1302
  • [5] Robust automatic speech recognition using a multi-channel signal separation front-end
    Yen, KC
    Zhao, YX
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1337 - 1340
  • [6] A unified network for multi-speaker speech recognition with multi-channel recordings
    Liu, Conggui
    Inoue, Nakamasa
    Shinoda, Koichi
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1304 - 1307
  • [7] FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION
    Wu Minhua
    Kumatani, Kenichi
    Sundaram, Shiva
    Strom, Nikko
    Hoffmeister, Bjorn
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6640 - 6644
  • [8] Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-channel Speech Recognition
    Li, Guanjun
    Liang, Shan
    Nie, Shuai
    Liu, Wenju
    Yang, Zhanlei
    Xiao, Longshuai
    INTERSPEECH 2020, 2020, : 51 - 55
  • [9] Robust speech recognition with multi-channel codebook dependent cepstral normalization (MCDCN)
    Deligne, S
    Gopinath, R
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 151 - 154
  • [10] Environmental robust speech and speaker recognition through multi-channel histogram equalization
    Squartini, Stefano
    Principi, Emanuele
    Rotili, Rudy
    Piazza, Francesco
    NEUROCOMPUTING, 2012, 78 (01) : 111 - 120