Automatic detection of voice impairments from text-dependent running speech

被引:41
作者
Godino-Llorente, J. I. [1 ]
Fraile, Ruben [1 ]
Saenz-Lechon, N. [1 ]
Osma-Ruiz, V. [1 ]
Gomez-Vilda, P. [2 ]
机构
[1] Univ Politecn Madrid, Dept Circuits & Syst Engn, Madrid 28031, Spain
[2] Univ Politecn Madrid, Dept Comp Sci & Engn, Madrid 28031, Spain
关键词
Running speech; Pathological voices; Mel cepstral parameters; Noise parameters; Voiced detection; Multilayer perceptron; TO-NOISE RATIO; PATHOLOGICAL VOICE; QUALITY; DISCRIMINATION; RECOGNITION;
D O I
10.1016/j.bspc.2009.01.007
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Acoustic analysis is a useful tool to diagnose voice diseases. Furthermore it presents several advantages: it is non-invasive, provides an objective diagnostic and, also, it can be used for the evaluation of surgical and pharmacological treatments and rehabilitation processes. Most of the approaches found in the literature address the automatic detection of voice impairments from speech by using the sustained phonation of vowels. In this paper it is proposed a new scheme for the detection of voice impairments from text-dependent running speech. The proposed methodology is based on the segmentation of speech into voiced and non-voiced frames, parameterising each voiced frame with mel-frequency cepstral parameters. The classification is carried out using a discriminative approach based on a multilayer perceptron neural network. The data used to train the system were taken from the voice disorders database distributed by Kay Elemetrics. The material used for training and testing contains the running speech corresponding to the well known "rainbow passage" of 140 patients (23 normal and 117 pathological). The results obtained are compared with those using sustained vowels. The text-dependent running speech showed a light improvement in the accuracy of the detection. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:176 / 182
页数:7
相关论文
共 31 条
[1]  
[Anonymous], 1997, P EUR C SPEECH COMM
[2]  
[Anonymous], MASS EYE EAR INF VOI
[3]  
Baken R. J., 2000, Clinical Measurement of Speech and Voice
[4]  
Bishop Christopher M, 1995, Neural networks for pattern recognition
[5]  
César M, 2000, P ANN INT IEEE EMBS, V22, P2369, DOI 10.1109/IEMBS.2000.900621
[6]  
Childers D.G., 2000, Speech processing and synthesis toolboxes
[7]   DETECTION OF LARYNGEAL FUNCTION USING SPEECH AND ELECTROGLOTTOGRAPHIC DATA [J].
CHILDERS, DG ;
BAE, KS .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 1992, 39 (01) :19-25
[8]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[9]   A CEPSTRUM-BASED TECHNIQUE FOR DETERMINING A HARMONICS-TO-NOISE RATIO IN SPEECH SIGNALS [J].
DEKROM, G .
JOURNAL OF SPEECH AND HEARING RESEARCH, 1993, 36 (02) :254-266
[10]   SOME SPECTRAL CORRELATES OF PATHOLOGICAL BREATHY AND ROUGH VOICE QUALITY FOR DIFFERENT TYPES OF VOWEL FRAGMENTS [J].
DEKROM, G .
JOURNAL OF SPEECH AND HEARING RESEARCH, 1995, 38 (04) :794-811