On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices

被引:49
作者
Arias-Londono, Julian David [1 ]
Godino-Llorente, Juan I. [1 ]
Markaki, Maria [2 ]
Stylianou, Yannis [2 ]
机构
[1] Univ Politecn Madrid, EUIT Telecomunicac, Madrid 28031, Spain
[2] Univ Crete, Dept Comp Sci, Iraklion 71409, Crete, Greece
关键词
Automatic assessment of voice; combining pattern classifiers; Gaussian mixture models; mel-frequency cepstral coefficients; modulation spectra; pathological voices; support vector machines; TO-NOISE RATIO; ACOUSTIC ANALYSIS; RECOGNITION; SPEECH; SYSTEM;
D O I
10.3109/14015439.2010.528788
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This work presents a novel approach for the automatic detection of pathological voices based on fusing the information extracted by means of mel-frequency cepstral coefficients (MFCC) and features derived from the modulation spectra (MS). The system proposed uses a two-stepped classification scheme. First, the MFCC and MS features were used to feed two different and independent classifiers; and then the outputs of each classifier were used in a second classification stage. In order to establish the best configuration which provides the highest accuracy in the detection, the fusion of information was carried out employing different classifier combination strategies. The experiments were carried out using two different databases: the one developed by The Massachusetts Eye and Ear Infirmary Voice Laboratory, and a database recorded by the Universidad Politecnica de Madrid. The results show that the combination of MFCC and MS features employing the proposed approach yields an improvement in the detection accuracy, demonstrating that both methods of parameterization are complementary.
引用
收藏
页码:60 / 69
页数:10
相关论文
共 30 条
[11]   Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and short-term cepstral parameters [J].
Godino-Llorente, Juan Ignacio ;
Gomez-Vilda, Pedro ;
Blanco-Velasco, Manuel .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2006, 53 (10) :1943-1953
[12]   A computer system for acoustic analysis of pathological voices and laryngeal diseases screening [J].
Hadjitodorov, S ;
Mitev, P .
MEDICAL ENGINEERING & PHYSICS, 2002, 24 (06) :419-429
[13]   THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE [J].
HANLEY, JA ;
MCNEIL, BJ .
RADIOLOGY, 1982, 143 (01) :29-36
[14]   Acoustic analysis of voice using WPCVox:: A comparative study with multi dimensional voice program [J].
Ignacio Godino-Llorente, Juan ;
Osma-Ruiz, Victor ;
Saenz-Lechon, Nicolas ;
Cobeta-Marco, Ignacio ;
Gonzalez-Herranz, Ramon ;
Ramirez-Calvo, Carlos .
EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2008, 265 (04) :465-476
[15]   Score normalization in multimodal biometric systems [J].
Jain, A ;
Nandakumar, K ;
Ross, A .
PATTERN RECOGNITION, 2005, 38 (12) :2270-2285
[16]   On combining classifiers [J].
Kittler, J ;
Hatef, M ;
Duin, RPW ;
Matas, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (03) :226-239
[17]  
Kuncheva L.I., 2014, Combining Pattern Classifiers: Methods and Algorithms, DOI DOI 10.1002/0471660264
[18]  
Malyska N, 2005, INT CONF ACOUST SPEE, P873
[19]  
MARKAKI M, 2010, P IEEE ICASSP DALL T
[20]   Using Modulation Spectra for Voice Pathology Detection and Classification [J].
Markaki, Maria ;
Stylianou, Yannis .
2009 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-20, 2009, :2514-2517