Cry-based infant pathology classification using GMMs

被引:46
作者
Alaie, Hesam Farsaie [1 ,2 ]
Abou-Abbas, Lina [1 ]
Tadj, Chakib [1 ]
机构
[1] Univ Quebec, Ecole Technol Super, MMS Lab, Dept Elect Engn, 1100 Rue Notre Dame Ouest, Montreal, PQ H3C 1K3, Canada
[2] A2446,1100 Rue Notre Dame Ouest, Quebec City, PQ, Canada
基金
比尔及梅琳达.盖茨基金会;
关键词
Gaussian mixture model; Universal background model; Mel-frequency Cepstral Coefficient; Likelihood ratio scores; Newborn infant cries; Expiratory and inspiratory cry; SPEAKER IDENTIFICATION; RECOGNITION; LIKELIHOOD; NEWBORN;
D O I
10.1016/j.specom.2015.12.001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Traditional studies of infant cry signals focus more on non-pathology-based classification of infants. In this paper, we introduce a noninvasive health care system that performs acoustic analysis of unclean noisy infant cry signals to extract and measure certain cry characteristics quantitatively and classify healthy and sick newborn infants according to only their cries. In the conduct of this newborn cry-based diagnostic system, the dynamic MFCC features along with static Mel-Frequency Cepstral Coefficients (MFCCs) are selected and extracted for both expiratory and inspiratory cry vocalizations to produce a discriminative and informative feature vector. Next, we create a unique cry pattern for each cry vocalization type and pathological condition by introducing a novel idea using the Boosting Mixture Learning (BML) method to derive either healthy or pathology subclass models separately from the Gaussian Mixture Model-Universal Background Model (GMM-UBM). Our newborn cry-based diagnostic system (NCDS) has a hierarchical scheme that is a treelike combination of individual classifiers. Moreover, a score-level fusion of the proposed expiratory and inspiratory cry-based subsystems is performed to make a more reliable decision. The experimental results indicate that the adapted BML method has lower error rates than the Bayesian approach or the maximum a posteriori probability (MAP) adaptation approach when considered as a reference method. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:28 / 52
页数:25
相关论文
共 63 条
[1]   Approximate is better than "exact" for interval estimation of binomial proportions [J].
Agresti, A ;
Coull, BA .
AMERICAN STATISTICIAN, 1998, 52 (02) :119-126
[2]  
Amaro-Camargo E, 2007, LECT NOTES COMPUT SC, V4681, P1078
[3]  
[Anonymous], 1993, Discrete-Time Processing of Speech Signals
[4]  
[Anonymous], 2001, Discrete-Time Speech Signal Processing:Principles and Practice
[5]   Fusion of face and speech data for person identity verification [J].
Ben-Yacoub, S ;
Abdeljaoued, Y ;
Mayoraz, E .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05) :1065-1074
[6]  
Benson J.B., 2010, Social and emotional development in infancy and early childhood
[7]  
Blum R.S., 2005, MULTISENSOR IMAGE FU
[8]  
Brummer N., 2010, Measuring, refining and calibrating speaker and language information extracted from speech, DOI 10.1016/j.csl.6352005.08.001
[9]  
Cano S, 2006, LECT NOTES COMPUT SC, V4225, P416
[10]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366