Far-field speaker recognition

被引:70
作者
Jin, Qin [1 ]
Schultz, Tanja [1 ]
Waibel, Alex [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Language Technol Inst, Interact Syst labs, Pittsburgh, PA 15213 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 07期
关键词
far-field microphones; mismatched conditions; multilingual phone strings; robust speaker recognition;
D O I
10.1109/TASL.2007.902876
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we study robust speaker recognition in far-field microphone situations. Two approaches are investigated to improve the robustness of speaker recognition in such scenarios. The first approach applies traditional techniques based on acoustic features. We introduce reverberation compensation as well as feature warping and gain significant improvements, even under mismatched training-testing conditions. In addition, we performed multiple channel combination experiments to make use of information from multiple distant microphones. Overall, we achieved up to 87.1% relative improvements on our Distant Microphone database and found that the gains hold across different data conditions and microphone settings. The second approach makes use of higher-level linguistic features. To capture speaker idiosyncrasies, we apply n-gram models trained on multilingual phone strings and show that higher-level features are more robust under mismatching conditions. Furthermore, we compared the performances between multilingual and multiengine systems, and examined the impact of a number of involved languages on recognition results. Our findings confirm the usefulness of language variety and indicate a language independent nature of this approach, which suggests that speaker recognition using multilingual phone strings could be successfully applied to any given language.
引用
收藏
页码:2023 / 2032
页数:10
相关论文
共 19 条
  • [1] Andrews W., 2002, P ICASSP
  • [2] ANDREWS W, 2001, P EUROSPEECH, P2517
  • [3] [Anonymous], THESIS CARNEGIE MELL
  • [4] CLARKSON P, 1997, P EUR 97 RHOD GREEC, P2707
  • [5] Doddington G., 2001, P EUROSPEECH, P2521
  • [6] CEPSTRAL ANALYSIS TECHNIQUE FOR AUTOMATIC SPEAKER VERIFICATION
    FURUI, S
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1981, 29 (02): : 254 - 272
  • [7] JANIN A, 2003, P ICASSP
  • [8] JIN Q, 2003, P ICASSP
  • [9] JIN Q, 2002, P ICASSP
  • [10] JIN Q, 2006, P ICASSP