Speaker verification in score-ageing-quality classification space

被引:27
作者
Kelly, Finnian [1 ]
Drygajlo, Andrzej [2 ]
Harte, Naomi [1 ]
机构
[1] Trinity Coll Dublin, Dept Elect & Elect Engn, Dublin, Ireland
[2] Swiss Fed Inst Technol Lausanne EPFL, Lausanne, Switzerland
关键词
Speaker verification; Ageing; Quality measures;
D O I
10.1016/j.csl.2012.12.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A challenge in automatic speaker verification is to create a system that is robust to the effects of vocal ageing. To observe the ageing effect, a speaker's voice must be analysed over a period of time, over which, variation in the quality of the voice samples is likely to be encountered. Thus, in dealing with the ageing problem, the related issue of quality must also be addressed. We present a solution to speaker verification across ageing by using a stacked classifier framework to combine ageing and quality information with the scores of a baseline classifier. In tandem, the Trinity College Dublin Speaker Ageing database of 18 speakers, each covering a 30-60 year time range, is presented. An evaluation of a baseline Gaussian Mixture Model Universal Background Model (GMM-UBM) system using this database demonstrates a progressive degradation in genuine speaker verification scores as ageing progresses. Consequently, applying a conventional threshold, determined using scores at the time of enrolment, results in poor long-term performance. The influence of quality on verification scores is investigated via a number of quality measures. Alongside established signal-based measures, a new model-based measure, Wnorm, is proposed, and its utility is demonstrated on the CSLU database. Combining ageing information with quality measures and the scores from the GMM-UBM system, a verification decision boundary is created in score-ageing-quality space. The best performance is achieved by using scores and ageing in conjunction with the new Wnorm quality measure, reducing verification error by 45% relative to the baseline. This work represents the first comprehensive analysis of speaker verification on a longitudinal speaker database and successfully addresses the associated variability from ageing and quality arte-facts. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1068 / 1084
页数:17
相关论文
共 57 条
[1]  
[Anonymous], 2010, The FG-NET Aging Database
[2]  
[Anonymous], SOURCE NORMALIZATION
[3]  
[Anonymous], EURASIP J ADV SIGNAL
[4]  
Ben M, 2002, INT CONF ACOUST SPEE, P689
[5]  
Bengio S, 2006, ODYSSEY 2006, P279
[6]  
Benjamin Barbaranne J., 1997, Seminars in Speech and Language, V18, P135, DOI 10.1055/s-2008-1064068
[7]  
Bilmes J., 1988, GENTLE TUTORIAL EM A
[8]  
Brandschain L., 2010, 7 C INT LANG RES EV
[9]  
Campbell WM, 2006, INT CONF ACOUST SPEE, P97
[10]  
Christianini N., 2000, INTRO SUPPORT VECTOR, P189