Multi-Channel Training for End-to-End Speaker Recognition under Reverberant and Noisy Environment

被引:9
作者
Cai, Danwei [1 ]
Qin, Xiaoyi [1 ,2 ]
Li, Ming [1 ]
机构
[1] Duke Kunshan Univ, Data Sci Res Ctr, Kunshan, Peoples R China
[2] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Peoples R China
来源
INTERSPEECH 2019 | 2019年
基金
中国国家自然科学基金;
关键词
speaker recognition; far-eld microphone array; multi-channel training; deep embeddings; IDENTIFICATION; ENHANCEMENT; FEATURES; SPEECH;
D O I
10.21437/Interspeech.2019-1437
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Despite the significant improvements in speaker recognition enabled by deep neural networks, unsatisfactory performance persists under far-field scenarios due to the effects of the long range fading, room reverberation, and environmental noises. In this study, we focus on far-field speaker recognition with a microphone array. We propose a multi-channel training framework for the deep speaker embedding neural network on noisy and reverberant data. The proposed multi-channel training framework simultaneously processes the time-, frequency- and channel-information to learn a robust deep speaker embedding. Based on the 2-dimensional or 3-dimensional convolution layer, we investigate different multi-channel training schemes. Experiments on the simulated multi-channel reverberant and noisy data show that the proposed method obtains significant improvements over the single-channel trained deep speaker embedding system with front end speech enhancement or multi-channel embedding fusion.
引用
收藏
页码:4365 / 4369
页数:5
相关论文
共 40 条
  • [1] Acoustic beamforming for speaker diarization of meetings
    Anguera, Xavier
    Wooters, Chuck
    Hernando, Javier
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 2011 - 2022
  • [2] [Anonymous], 2009, Distant Speech Recognition
  • [3] Assmann PF., 2004, SPEECH PROCESSING AU, P231, DOI DOI 10.1007/0-387-21575-1_5
  • [4] Avila AR, 2014, INTERSPEECH, P1096
  • [5] Deep Speaker Embeddings for Short-Duration Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Kenny, Patrick
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1517 - 1521
  • [6] Borgström BJ, 2012, INT CONF ACOUST SPEE, P4065, DOI 10.1109/ICASSP.2012.6288811
  • [7] Brutti A., 2016, ODYSSEY, P252
  • [8] Cai D., 2019, INTERSPEECH
  • [9] Cai W., 2018, P OD SPEAK LANG REC, P74
  • [10] Cai WC, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5189, DOI 10.1109/ICASSP.2018.8462025