Robust Speaker Recognition in Noisy Conditions by Means of Online Training with Noise Profiles

被引：3

作者：

Al-Noori, Ahmed H. Y. ^{[1
]}

Duncan, Philip ^{[1
]}

机构：

[1] Univ Salford, Sch Comp Sci & Engn, Salford M5 4WT, Lancs, England

来源：

JOURNAL OF THE AUDIO ENGINEERING SOCIETY | 2019年 / 67卷 / 04期

关键词：

SUPPORT VECTOR MACHINES; SPEECH;

D O I：

10.17743/jaes.2019.0004

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Automated speaker recognition attains impressive reliability when tested under controlled laboratory acoustic conditions. The environmental noise that inevitably presents in many real-world speech samples causes considerable degradation of recognition accuracy due to the so-called "channel mismatch" that occurs between the enrollment and recognition phases. A new online training method is proposed in this paper to improve robustness of speaker recognition in noisy conditions. An estimate of the signal to noise ratio and the emulated ambient noise spectral profile found in the silence intervals of the speech signal are used to re-enroll the reference model for a claimed speaker to generate a new noisy reference model. The proposed online training method has been examined and validated using an MFCC-GMM UBM based speaker recognition system. Results show significant improvement in performance.

引用

页码：174 / 189

页数：16

共 33 条

[1] Al Noori A., 2017, 2017 AES INT C AUD F, DOI [10.17743/aesconf.2017.978-1-942220-14-5, DOI 10.17743/AESCONF.2017.978-1-942220-14-5]
[2] Al- Noori A., 2016, 140 CONV AUD ENG SOC
[3] Bai JM, 2004, 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, P69
[4] Beigi H., 2011, FUNDAMENTALS SPEAKER, DOI [10.1007/978-0-387-77592-0_17, DOI 10.1007/978-0-387-77592-0_17]
[5] Support vector machines for speaker and language recognition
Campbell, WM
Campbell, JP
Reynolds, DA
Singer, E
Torres-Carrasquillo, PA
[J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) : 210 - 229
[6] Support vector machines using GMM supervectors for speaker verification
Campbell, WM
Sturim, DE
Reynolds, DA
[J]. IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) : 308 - 311
[7] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES
DAVIS, SB
MERMELSTEIN, P
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04): : 357 - 366
[8] Dehak N, 2010, ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, P71
[9] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
DEMPSTER, AP
LAIRD, NM
RUBIN, DB
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
[10] Garofolo J., 1993, NASA STI/Recon technical report n, 93, 27403, V93, P27403

← 1 2 3 4 →