Score Level versus Audio Level Fusion for Voice Pathology Detection on the Saarbrucken Voice Database

被引：0

作者：

Martinez, David ^{[1
]}

Lleida, Eduardo ^{[1
]}

Ortega, Alfonso ^{[1
]}

Miguel, Antonio ^{[1
]}

机构：

[1] Univ Zaragoza, Aragon Inst Engn Res I3A, E-50009 Zaragoza, Spain

来源：

ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES | 2012年 / 328卷

关键词：

Pathological Voice Detection; Saarbrucken Voice Database; GMM; Fusion; MultiFocal toolkit;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The article presents a set of experiments on pathological voice detection over the Saarbrucken Voice Database (SVD). The SVD is freely available online containing a collection of voice recordings of different pathologies, both functional and organic. It includes recordings for more than 2000 speakers in which sustained vowels /a/, /i/, and /u/ are pronounced with normal, low, high, and low-high-low intonations. This variety of sounds makes possible to set different experiments, and in this paper a comparison between the performance of a system where all the vowels and intonations are pooled together to train a single model per class, and a system where a different model per class is trained for each vowel and intonation, and the scores of each subsystem are fused at the end, is conducted. The first approach is what we call audio level fusion, and the second is what we call score level fusion. For classification, a generative Gaussian mixture model trained with mel-frequency cepstral coefficients, harmonics-to-noise ratio, normalized noise energy and glottal-to-noise excitation ratio, is used. It is shown that the score level fusion is far more effective than the audio level fusion.

引用

页码：110 / +

页数：3

共 27 条

[1] [Anonymous], SAARBRUCKEN VOICE DA
[2] [Anonymous], 1994, DIS VOIC DAT VERS 1
[3] Arias-Londono J. D., 2010, LOGOP PHONIATRICS VO
[4] Brummer N., 2006, COMPUTER SPEECH LANG, V20
[5] Brummer N., BOSARIS TOOLKITUSER
[6] Brummer N., FOCAL MULTICLASS TOO
[7] Carding P., 2000, LOGOP PHONIATRICS VO, V25
[8] Davis S., 1980, IEEE T ACOUST, V28
[9] Direct speech feature estimation using an iterative EM algorithm for vocal fold pathology detection
GavidiaCeballos, L
Hansen, JHL
[J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 1996, 43 (04) : 373 - 383
[10] Gelzinis A., 2008, COMPUT METHODS PROGR, V91

← 1 2 3 →