Segmentation of expiratory and inspiratory sounds in baby cry audio recordings using hidden Markov models

被引：15

作者：

Aucouturier, Jean-Julien ^{[1
]}

Nonaka, Yulri ^{[2
]}

Katahira, Kentaro ^{[2
]}

Okanoya, Kazuo ^{[2
]}

机构：

[1] Temple Univ, Dept Comp & Informat Sci, Minato Ku, Tokyo 1060047, Japan

[2] RIKEN Brain Sci Inst, JST ERATO Okanoya Emot Informat Project, Wako, Saitama 3510198, Japan

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2011年 / 130卷 / 05期

关键词：

INFANT CRY; RECOGNITION;

D O I：

10.1121/1.3641377

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The paper describes an application of machine learning techniques to identify expiratory and inspiration phases from the audio recording of human baby cries. Crying episodes were recorded from 14 infants, spanning four vocalization contexts in their first 12 months of age; recordings from three individuals were annotated manually to identify expiratory and inspiratory sounds and used as training examples to segment automatically the recordings of the other 11 individuals. The proposed algorithm uses a hidden Markov model architecture, in which state likelihoods are estimated either with Gaussian mixture models or by converting the classification decisions of a support vector machine. The algorithm yields up to 95% classification precision (86% average), and its ability generalizes over different babies, different ages, and vocalization contexts. The technique offers an opportunity to quantify expiration duration, count the crying rate, and other time-related characteristics of baby crying for screening, diagnosis, and research purposes over large populations of infants. (C) 2011 Acoustical Society of America. [DOI: 10.1121/1.3641377]

引用

页码：2969 / 2977

页数：9

共 29 条

[1]

[Anonymous], 2003, PRACTICAL GUIDE SUPP, DOI [DOI 10.1177/02632760022050997, 10 . 1177 / 02632760022050997]

[2]

[Anonymous], NETLAB TOOLBOX

[3]

[Anonymous], 2006, PATTERN RECOGN, DOI DOI 10.1117/1.2819119

[4] The bag-of-frames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music [J].

Aucouturier, Jean-Julien ;

Defreville, Boris ;

Pachet, Francois .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 122 (02) :881-891

[5] Longitudinal study of the fundamental frequency of hunger cries along the first 6 months of healthy babies [J].

Baeck, Heidi Elisabeth ;

de Souza, Marcio Nogueira .

JOURNAL OF VOICE, 2007, 21 (05) :551-559

[6] A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment [J].

Cont, Arshia .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (06) :974-987

[7]

CORWIN MJ, 1995, PEDIATRICS, V96, P73

[8] Musical instrument recognition by pairwise classification strategies [J].

Essid, Slim ;

Richard, Gael ;

David, Bertrand .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04) :1401-1412

[9]

FOOTE JT, 1994, INT CONF ACOUST SPEE, P317

[10] Back-and-Forth Methodology for Objective Voice Quality Assessment: From/to Expert Knowledge to/from Automatic Classification of Dysphonia [J].

Fredouille, Corinne ;

Pouchoulin, Gilles ;

Ghio, Alain ;

Revis, Joana ;

Bonastre, Jean-Francois ;

Giovanni, Antoine .

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2009,

← 1 2 3 →