Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments

被引:15
|
作者
Taherian, Hassan [1 ]
Wang, Zhong-Qiu [1 ]
Wane, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
来源
INTERSPEECH 2019 | 2019年
基金
美国国家科学基金会;
关键词
Robust speaker recognition; beamforming; x-vector; deep neural network; WIENER FILTER;
D O I
10.21437/Interspeech.2019-1428
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Despite successful applications of multi-channel signal processing in robust automatic speech recognition (ASR), relatively little research has been conducted on the effectiveness of such techniques in the robust speaker recognition domain. This paper introduces time-frequency (T-F) masking-based beamforming to address text-independent speaker recognition in conditions where strong diffuse noise and reverberation are both present. We examine various masking-based beamformers, such as parameterized multi-channel Wiener filter, generalized eigenvalue (GEV) beamformer and minimum variance distortion-less response (MVDR) beamformer, and evaluate their performance in terms of speaker recognition accuracy for i-vector and x-vector based systems. In addition, we present a different formulation for estimating steering vectors from speech covariance matrices. We show that rank-1 approximation of a speech covariance matrix based on generalized eigenvalue decomposition leads to the best results for the masking-based MVDR beamformer. Experiments on the recently introduced NIST SRE 2010 retransmitted corpus show that the MVDR beamformer with rank-1 approximation provides an absolute reduction of 5.55% in equal error rate compared to a standard masking-based MVDR beamformer.
引用
收藏
页码:4070 / 4074
页数:5
相关论文
共 50 条
  • [1] Multi-Channel Training for End-to-End Speaker Recognition under Reverberant and Noisy Environment
    Cai, Danwei
    Qin, Xiaoyi
    Li, Ming
    INTERSPEECH 2019, 2019, : 4365 - 4369
  • [2] Self-Attention for Multi-Channel Speech Separation in Noisy and Reverberant Environments
    Liu, Conggui
    Sato, Yoshinao
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 794 - 799
  • [3] Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments
    Krishnamoorthy, P.
    Prasanna, S. R. Mahadeva
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2009, 34 (05): : 729 - 754
  • [4] Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments
    P. Krishnamoorthy
    S. R. Mahadeva Prasanna
    Sadhana, 2009, 34 : 729 - 754
  • [5] PERFORMANCE MONITORING FOR AUTOMATIC SPEECH RECOGNITION IN NOISY MULTI-CHANNEL ENVIRONMENTS
    Meyerl, Bernd T.
    Mallidi, Sri Harish
    Martinez, Angel Mario Castro
    Paya-Vaya, Guillermo
    Kayser, Hendrik
    Hermansky, Hynek
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 50 - 56
  • [6] Multi-channel noise reduction in noisy environments
    Li, Junfeng
    Akagi, Masato
    Suzuki, Yoiti
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 258 - +
  • [7] Nonlinear filtering for speaker tracking in noisy and reverberant environments
    Vermaak, J
    Blake, A
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 3021 - 3024
  • [8] Speaker recognition system in multi-channel environment
    Sang, LF
    Wu, ZH
    Yang, YC
    2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 3116 - 3121
  • [9] Deep Learning Methods for Multi-Channel EEG-Based Emotion Recognition
    Olamat, Ali
    Ozel, Pinar
    Atasever, Sema
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2022, 32 (05)
  • [10] Glottal information based spectral recuperation in multi-channel speaker recognition
    Yang, P
    Yang, YC
    Wu, ZH
    ADVANCES IN BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2004, 3338 : 602 - 609