Analyzing Features for Automatic Age Estimation on Cross-Sectional Data

被引:0
作者
Spiegl, Werner [1 ]
Stemmer, Georg [2 ]
Lasarcyk, Eva [3 ]
Kolhatkar, Varada [4 ]
Cassidy, Andrew [5 ]
Potard, Blaise [6 ]
Shum, Stephen [7 ]
Song, Young Chol [8 ]
Xu, Puyang [5 ]
Beyerlein, Peter [9 ]
Harnsberger, James [10 ]
Noeth, Elmar [1 ]
机构
[1] Univ Erlangen Nurnberg, Chair Pattern Recognit LME, Erlangen, Germany
[2] SVOX Deutschland GmbH, Munich, Germany
[3] Saarland Univ, Dep Computat Linguist & Phonet, Saarbrucken, Germany
[4] Univ Minnesota, Dept Comp Sci, Duluth, MN 55812 USA
[5] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA
[6] CRIN, Nancy, France
[7] Univ Calif Berkeley, Int Comp Sci Inst, Berkeley, CA 94720 USA
[8] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY 11794 USA
[9] Univ Appl Sci Wildau, Dept Bioinformat, Berlin, Germany
[10] Univ Florida, Speech Percept Lab, Gainesville, FL 32611 USA
来源
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年
关键词
Age regression; age estimation; vocal aging; prosodic features; support vector regression (SVR);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We develop an acoustic feature set for the estimation of a person's age from a recorded speech signal. The baseline features are Mel-frequency cepstral coefficients (MFCCs) which are extended by various prosodic features, pitch and formant frequencies. From experiments on the University of Florida Vocal Aging Database we can draw different conclusions. On the one hand, adding prosodic, pitch and formant features to the MFCC baseline leads to relative reductions of the mean absolute error between 4-20%. Improvements are even larger when perceptual age labels are taken as a reference. On the other hand, reasonable results with a mean absolute en-or in age estimation of about 12 years are already achieved using a simple gender-independent setup and MFCCs only. Future experiments will evaluate the robustness of the prosodic features against channel variability on other databases and investigate the differences between perceptual and chronological age labels.
引用
收藏
页码:2899 / +
页数:2
相关论文
共 15 条
[1]  
[Anonymous], P EUR GEN SWITZ
[2]  
BATLINER A, 2000, PROSODY MODULE, P106
[3]   Age and gender recognition for telephone applications based on GMM supervectors and support vector machines [J].
Bocklet, Tobias ;
Maier, Andreas ;
Bauer, Josef G. ;
Burkhardt, Felix ;
Noeth, Elmar .
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :1605-+
[4]  
CLARK V, 2008, SAS STAT 9 2 USERS G
[5]   Speaking rate and fundamental frequency as speech cues to perceived age [J].
Hamsberger, James D. ;
Shrivastav, Rahul ;
Brown, W. S., Jr. ;
Rothman, Howard ;
Hollien, Harry .
JOURNAL OF VOICE, 2008, 22 (01) :58-69
[6]  
HARNSBERGER JD, J VOICE IN PRESS
[7]   A COMPARISON OF SELECTED PHONATORY BEHAVIORS OF HEALTHY AGED AND YOUNG-ADULTS [J].
HIGGINS, MB ;
SAXMAN, JH .
JOURNAL OF SPEECH AND HEARING RESEARCH, 1991, 34 (05) :1000-1010
[8]   The sound of senescence [J].
Linville, SE .
JOURNAL OF VOICE, 1996, 10 (02) :190-200
[9]   PEAKS - A system for the automatic evaluation of voice and speech disorders [J].
Maier, A. ;
Haderlein, T. ;
Eysholdt, U. ;
Rosanowski, F. ;
Batliner, A. ;
Schuster, M. ;
Noeth, E. .
SPEECH COMMUNICATION, 2009, 51 (05) :425-437
[10]  
Metze F, 2007, INT CONF ACOUST SPEE, P1089