Multi-Channel Training for End-to-End Speaker Recognition under Reverberant and Noisy Environment

被引：9

作者：

Cai, Danwei ^{[1
]}

Qin, Xiaoyi ^{[1
,2
]}

Li, Ming ^{[1
]}

机构：

[1] Duke Kunshan Univ, Data Sci Res Ctr, Kunshan, Peoples R China

[2] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Peoples R China

来源：

INTERSPEECH 2019 | 2019年

基金：

中国国家自然科学基金;

关键词：

speaker recognition; far-eld microphone array; multi-channel training; deep embeddings; IDENTIFICATION; ENHANCEMENT; FEATURES; SPEECH;

D O I：

10.21437/Interspeech.2019-1437

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Despite the significant improvements in speaker recognition enabled by deep neural networks, unsatisfactory performance persists under far-field scenarios due to the effects of the long range fading, room reverberation, and environmental noises. In this study, we focus on far-field speaker recognition with a microphone array. We propose a multi-channel training framework for the deep speaker embedding neural network on noisy and reverberant data. The proposed multi-channel training framework simultaneously processes the time-, frequency- and channel-information to learn a robust deep speaker embedding. Based on the 2-dimensional or 3-dimensional convolution layer, we investigate different multi-channel training schemes. Experiments on the simulated multi-channel reverberant and noisy data show that the proposed method obtains significant improvements over the single-channel trained deep speaker embedding system with front end speech enhancement or multi-channel embedding fusion.

引用

页码：4365 / 4369

页数：5

共 40 条

[1] Acoustic beamforming for speaker diarization of meetings
Anguera, Xavier
Wooters, Chuck
Hernando, Javier
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 2011 - 2022
[2] [Anonymous], 2009, Distant Speech Recognition
[3] Assmann PF., 2004, SPEECH PROCESSING AU, P231, DOI DOI 10.1007/0-387-21575-1_5
[4] Avila AR, 2014, INTERSPEECH, P1096
[5] Deep Speaker Embeddings for Short-Duration Speaker Verification
Bhattacharya, Gautam
Alam, Jahangir
Kenny, Patrick
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1517 - 1521
[6] Borgström BJ, 2012, INT CONF ACOUST SPEE, P4065, DOI 10.1109/ICASSP.2012.6288811
[7] Brutti A., 2016, ODYSSEY, P252
[8] Cai D., 2019, INTERSPEECH
[9] Cai W., 2018, P OD SPEAK LANG REC, P74
[10] Cai WC, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5189, DOI 10.1109/ICASSP.2018.8462025

← 1 2 3 4 →