On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices

被引:49
作者
Arias-Londono, Julian David [1 ]
Godino-Llorente, Juan I. [1 ]
Markaki, Maria [2 ]
Stylianou, Yannis [2 ]
机构
[1] Univ Politecn Madrid, EUIT Telecomunicac, Madrid 28031, Spain
[2] Univ Crete, Dept Comp Sci, Iraklion 71409, Crete, Greece
关键词
Automatic assessment of voice; combining pattern classifiers; Gaussian mixture models; mel-frequency cepstral coefficients; modulation spectra; pathological voices; support vector machines; TO-NOISE RATIO; ACOUSTIC ANALYSIS; RECOGNITION; SPEECH; SYSTEM;
D O I
10.3109/14015439.2010.528788
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This work presents a novel approach for the automatic detection of pathological voices based on fusing the information extracted by means of mel-frequency cepstral coefficients (MFCC) and features derived from the modulation spectra (MS). The system proposed uses a two-stepped classification scheme. First, the MFCC and MS features were used to feed two different and independent classifiers; and then the outputs of each classifier were used in a second classification stage. In order to establish the best configuration which provides the highest accuracy in the detection, the fusion of information was carried out employing different classifier combination strategies. The experiments were carried out using two different databases: the one developed by The Massachusetts Eye and Ear Infirmary Voice Laboratory, and a database recorded by the Universidad Politecnica de Madrid. The results show that the combination of MFCC and MS features employing the proposed approach yields an improvement in the detection accuracy, demonstrating that both methods of parameterization are complementary.
引用
收藏
页码:60 / 69
页数:10
相关论文
共 30 条
[1]  
[Anonymous], P ANN C INT SPEECH C
[2]  
[Anonymous], 1997, P EUR C SPEECH COMM
[3]  
[Anonymous], 1994, MASSACHUSETTS EYE EA
[4]   Joint acoustic and modulation frequency [J].
Atlas, L ;
Shamma, SA .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2003, 2003 (07) :668-675
[5]  
Baken R. J., 2000, Clinical Measurement of Speech and Voice
[6]   A comparative study of traditional and newly proposed features for recognition of speech under stress [J].
Bou-Ghazale, SE ;
Hansen, JHL .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04) :429-442
[7]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[8]   A multilinear singular value decomposition [J].
De Lathauwer, L ;
De Moor, B ;
Vandewalle, J .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2000, 21 (04) :1253-1278
[9]   A CEPSTRUM-BASED TECHNIQUE FOR DETERMINING A HARMONICS-TO-NOISE RATIO IN SPEECH SIGNALS [J].
DEKROM, G .
JOURNAL OF SPEECH AND HEARING RESEARCH, 1993, 36 (02) :254-266
[10]   SHORT-TERM STABILITY MEASURES FOR THE EVALUATION OF VOCAL QUALITY [J].
FEIJOO, S ;
HERNANDEZ, C .
JOURNAL OF SPEECH AND HEARING RESEARCH, 1990, 33 (02) :324-334