Kullback-Leibler divergence and sample skewness for pathological voice quality assessment

被引：13

作者：

Barreira, Ramiro R. A. ^{[1
]}

Ling, Lee Luan ^{[1
]}

机构：

[1] Univ Estadual Campinas, Fac Engn Eletr & Comp, Dept Comunicacoes, Av Albert Einstein 400,Cidade Univ Zeferino Vaz, BR-13083852 Campinas, SP, Brazil

来源：

BIOMEDICAL SIGNAL PROCESSING AND CONTROL | 2020年 / 57卷

关键词：

Voice pathology detection; Kullback-Leibler divergence; Mel-frequency cepstral coefficients (MFCC); Generalized extreme value (GEV); distribution; Gaussian mixture models (GMM); Na ve Bayes classifier; TO-NOISE RATIO; GLOTTAL CHARACTERISTICS; AUTOMATIC DETECTION; PARAMETERS; SPEAKERS;

D O I：

10.1016/j.bspc.2019.101697

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

This paper proposes new features aiming to improve the performance of an automatic voice pathology detection system. The features are designed precisely in terms of voice pathologies effects upon the speech signal. The system is intended to deliver high accuracy with a low number of parameters. Kullback-Leibler divergence (KLD) applied to consecutive frames of the speech signal provides a measure of voice instability. In this work, the KLD is applied to frame's histogram and a modified form of its spectrum named higher amplitude suppression spectrum (HASS). The H-KLD (histogram KLD) and the HASS-KLD are two of the three features presently approached. An additional feature that provides the level of damping of the voice pitch period waveform is proposed, the short-term sample skewness of the signal. The H-KLD, the HASS-KLD, and the sample skewness are features employed along with mel-frequency cepstral coefficients (MFCC) in a voice pathology detection system. The system is composed of a Gaussian mixture models (GMM) classifier and two generalized extreme value (GEV) distribution classifiers. They are fused by means of a Gaussian naive Bayes classifier. A standard subset of the Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database is adopted for evaluating the system. The obtained global success rate of 99.55% shows that the proposed features are suitable for pathological voice quality assessment. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页数：11

共 30 条

[1]

[Anonymous], 1994, VOIC DIS DAT VERS 1

[2] Automatic Detection of Pathological Voices Using Complexity Measures, Noise Parameters, and Mel-Cepstral Coefficients [J].

Arias-Londono, Julian D. ;

Godino-Llorente, Juan I. ;

Saenz-Lechon, Nicolas ;

Osma-Ruiz, Victor ;

Castellanos-Dominguez, German .

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2011, 58 (02) :370-379

[3]

Baghai-Ravary L., 2013, AUTOMATIC SPEECH SIG, P21

[4]

Coles S., 2001, INTRO STAT MODELING, V208

[5] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[6] A CEPSTRUM-BASED TECHNIQUE FOR DETERMINING A HARMONICS-TO-NOISE RATIO IN SPEECH SIGNALS [J].

DEKROM, G .

JOURNAL OF SPEECH AND HEARING RESEARCH, 1993, 36 (02) :254-266

[7] Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors [J].

Godino-Llorente, JI ;

Gómez-Vilda, P .

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2004, 51 (02) :380-384

[8] Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and short-term cepstral parameters [J].

Godino-Llorente, Juan Ignacio ;

Gomez-Vilda, Pedro ;

Blanco-Velasco, Manuel .

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2006, 53 (10) :1943-1953

[9] Glottal characteristics of female speakers: Acoustic correlates [J].

Hanson, HM .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 101 (01) :466-481

[10] Glottal characteristics of male speakers: Acoustic correlates and comparison with female data [J].

Hanson, HM ;

Chuang, ES .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (02) :1064-1077

← 1 2 3 →