3-D CNN MODELS FOR FAR-FIELD MULTI-CHANNEL SPEECH RECOGNITION

被引:0
|
作者
Ganapathy, Sriram [1 ]
Peddinti, Vijayaditya [2 ]
机构
[1] Indian Inst Sci, Bangalore, Karnataka, India
[2] Google Inc, Mountain View, CA USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
Far-field speech recognition; 3D CNN modeling; Multi-party conversational speech; NEURAL-NETWORKS; CORPUS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) in far-field reverberant environments, especially when involving natural conversational multiparty speech conditions, is challenging even with the state-of-theart recognition methodologies. The two main issues are artifacts in the signal due to reverberation and the presence of multiple speakers. In this paper, we propose a three dimensional (3-D) convolutional neural network (CNN) architecture for multi-channel far-field ASR. This architecture processes time, frequency & channel dimensions of the input spectrogram to learn representations using convolutional layers. Experiments are performed on the REVERB challenge LVCSR task and the augmented multi-party (AMI) LVCSR task using the array microphone recordings. The proposed method shows improvements over the baseline system that uses beamforming of the multi-channel audio along with a 2-D conventional CNN framework (absolute improvements of 1.1 % over the beamformed baseline system on AMI dataset).
引用
收藏
页码:5499 / 5503
页数:5
相关论文
共 22 条
  • [1] 3-D ACOUSTIC MODELING FOR FAR-FIELD MULTI-CHANNEL SPEECH RECOGNITION
    Purushothaman, Anurenjan
    Sreeram, Anirudh
    Ganapathy, Sriram
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6964 - 6968
  • [2] Parameter-efficient adaptation with multi-channel adversarial training for far-field speech recognition
    Tong Niu
    Yaqi Chen
    Dan Qu
    Hengbo Hu
    ChengRan Liu
    EURASIP Journal on Audio, Speech, and Music Processing, 2025 (1)
  • [3] CONFERENCINGSPEECH CHALLENGE: TOWARDS FAR-FIELD MULTI-CHANNEL SPEECH ENHANCEMENT FOR VIDEO CONFERENCING
    Rao, Wei
    Fu, Yihui
    Hu, Yanxin
    Xu, Xin
    Jv, Yvkai
    Han, Jiangyu
    Jiang, Zhongjie
    Xie, Lei
    Wang, Yannan
    Watanabe, Shinji
    Tan, Zheng-Hua
    Bu, Hui
    Yu, Tao
    Shang, Shidong
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 679 - 686
  • [4] FAR-FIELD SPEECH RECOGNITION USING CNN-DNN-HMM WITH CONVOLUTION IN TIME
    Yoshioka, Takuya
    Karita, Shigeki
    Nakatani, Tomohiro
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4360 - 4364
  • [5] Far-Field Speech Recognition Using Multivariate Autoregressive Models
    Ganapathy, Sriram
    Harish, Madhumita
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3023 - 3027
  • [6] MULTI-MICROPHONE NEURAL SPEECH SEPARATION FOR FAR-FIELD MULTI-TALKER SPEECH RECOGNITION
    Yoshioka, Takuya
    Erdogan, Hakan
    Chen, Zhuo
    Alleva, Fil
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5739 - 5743
  • [7] AN INVESTIGATION INTO USING PARALLEL DATA FOR FAR-FIELD SPEECH RECOGNITION
    Qian, Yanmin
    Tan, Tian
    Yu, Dong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5725 - 5729
  • [8] INTEGRATED ADAPTATION WITH MULTI-FACTOR JOINT-LEARNING FOR FAR-FIELD SPEECH RECOGNITION
    Qian, Yanmin
    Tan, Tian
    Yu, Dong
    Zhang, Yu
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5770 - 5774
  • [9] DEEP BEAMFORMING NETWORKS FOR MULTI-CHANNEL SPEECH RECOGNITION
    Xiao, Xiong
    Watanabe, Shinji
    Erdogan, Hakan
    Lu, Liang
    Hershey, John
    Seltzer, Michael L.
    Chen, Guoguo
    Zhang, Yu
    Mandel, Michael
    Yu, Dong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5745 - 5749
  • [10] A Study on Improving Acoustic Model for Robust and Far-Field Speech Recognition
    Xue, Shaofei
    Yan, Zhijie
    Yu, Tao
    Liu, Zhang
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,