Robust Speaker Recognition in Noisy Conditions by Means of Online Training with Noise Profiles

被引:3
作者
Al-Noori, Ahmed H. Y. [1 ]
Duncan, Philip [1 ]
机构
[1] Univ Salford, Sch Comp Sci & Engn, Salford M5 4WT, Lancs, England
来源
JOURNAL OF THE AUDIO ENGINEERING SOCIETY | 2019年 / 67卷 / 04期
关键词
SUPPORT VECTOR MACHINES; SPEECH;
D O I
10.17743/jaes.2019.0004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automated speaker recognition attains impressive reliability when tested under controlled laboratory acoustic conditions. The environmental noise that inevitably presents in many real-world speech samples causes considerable degradation of recognition accuracy due to the so-called "channel mismatch" that occurs between the enrollment and recognition phases. A new online training method is proposed in this paper to improve robustness of speaker recognition in noisy conditions. An estimate of the signal to noise ratio and the emulated ambient noise spectral profile found in the silence intervals of the speech signal are used to re-enroll the reference model for a claimed speaker to generate a new noisy reference model. The proposed online training method has been examined and validated using an MFCC-GMM UBM based speaker recognition system. Results show significant improvement in performance.
引用
收藏
页码:174 / 189
页数:16
相关论文
共 33 条
  • [1] Al Noori A., 2017, 2017 AES INT C AUD F, DOI [10.17743/aesconf.2017.978-1-942220-14-5, DOI 10.17743/AESCONF.2017.978-1-942220-14-5]
  • [2] Al- Noori A., 2016, 140 CONV AUD ENG SOC
  • [3] Bai JM, 2004, 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, P69
  • [4] Beigi H., 2011, FUNDAMENTALS SPEAKER, DOI [10.1007/978-0-387-77592-0_17, DOI 10.1007/978-0-387-77592-0_17]
  • [5] Support vector machines for speaker and language recognition
    Campbell, WM
    Campbell, JP
    Reynolds, DA
    Singer, E
    Torres-Carrasquillo, PA
    [J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) : 210 - 229
  • [6] Support vector machines using GMM supervectors for speaker verification
    Campbell, WM
    Sturim, DE
    Reynolds, DA
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) : 308 - 311
  • [7] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES
    DAVIS, SB
    MERMELSTEIN, P
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04): : 357 - 366
  • [8] Dehak N, 2010, ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, P71
  • [9] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [10] Garofolo J., 1993, NASA STI/Recon technical report n, 93, 27403, V93, P27403