Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments

被引：15

作者：

Taherian, Hassan ^{[1
]}

Wang, Zhong-Qiu ^{[1
]}

Wane, DeLiang ^{[1
,2
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA

来源：

INTERSPEECH 2019 | 2019年

基金：

美国国家科学基金会;

关键词：

Robust speaker recognition; beamforming; x-vector; deep neural network; WIENER FILTER;

D O I：

10.21437/Interspeech.2019-1428

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Despite successful applications of multi-channel signal processing in robust automatic speech recognition (ASR), relatively little research has been conducted on the effectiveness of such techniques in the robust speaker recognition domain. This paper introduces time-frequency (T-F) masking-based beamforming to address text-independent speaker recognition in conditions where strong diffuse noise and reverberation are both present. We examine various masking-based beamformers, such as parameterized multi-channel Wiener filter, generalized eigenvalue (GEV) beamformer and minimum variance distortion-less response (MVDR) beamformer, and evaluate their performance in terms of speaker recognition accuracy for i-vector and x-vector based systems. In addition, we present a different formulation for estimating steering vectors from speech covariance matrices. We show that rank-1 approximation of a speech covariance matrix based on generalized eigenvalue decomposition leads to the best results for the masking-based MVDR beamformer. Experiments on the recently introduced NIST SRE 2010 retransmitted corpus show that the MVDR beamformer with rank-1 approximation provides an absolute reduction of 5.55% in equal error rate compared to a standard masking-based MVDR beamformer.

引用

页码：4070 / 4074

页数：5

共 50 条

[1] Multi-Channel Training for End-to-End Speaker Recognition under Reverberant and Noisy Environment
Cai, Danwei
Qin, Xiaoyi
Li, Ming
INTERSPEECH 2019, 2019, : 4365 - 4369
[2] Self-Attention for Multi-Channel Speech Separation in Noisy and Reverberant Environments
Liu, Conggui
Sato, Yoshinao
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 794 - 799
[3] Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments
Krishnamoorthy, P.
Prasanna, S. R. Mahadeva
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2009, 34 (05): : 729 - 754
[4] Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments
P. Krishnamoorthy
S. R. Mahadeva Prasanna
Sadhana, 2009, 34 : 729 - 754
[5] PERFORMANCE MONITORING FOR AUTOMATIC SPEECH RECOGNITION IN NOISY MULTI-CHANNEL ENVIRONMENTS
Meyerl, Bernd T.
Mallidi, Sri Harish
Martinez, Angel Mario Castro
Paya-Vaya, Guillermo
Kayser, Hendrik
Hermansky, Hynek
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 50 - 56
[6] Multi-channel noise reduction in noisy environments
Li, Junfeng
Akagi, Masato
Suzuki, Yoiti
CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 258 - +
[7] Nonlinear filtering for speaker tracking in noisy and reverberant environments
Vermaak, J
Blake, A
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 3021 - 3024
[8] Speaker recognition system in multi-channel environment
Sang, LF
Wu, ZH
Yang, YC
2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 3116 - 3121
[9] Deep Learning Methods for Multi-Channel EEG-Based Emotion Recognition
Olamat, Ali
Ozel, Pinar
Atasever, Sema
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2022, 32 (05)
[10] Glottal information based spectral recuperation in multi-channel speaker recognition
Yang, P
Yang, YC
Wu, ZH
ADVANCES IN BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2004, 3338 : 602 - 609

← 1 2 3 4 5 →