Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks

被引：12

作者：

Amiriparian, Shahin ^{[1
]}

Baird, Alice ^{[1
]}

Julka, Sahib ^{[1
]}

Alcorn, Alyssa ^{[2
]}

Ottl, Sandra ^{[1
]}

Petrovic, Suncica ^{[3
]}

Ainger, Eloise ^{[2
]}

Cummins, Nicholas ^{[1
]}

Schuller, Bjoern ^{[1
,4
]}

机构：

[1] Univ Augsburg, Embedded Intelligence Hlth Care & Wellbeing, Augsburg, Germany

[2] UCL Inst Educ, Ctr Res Autism & Educ, London, England

[3] Serbian Soc Autism, Belgrade, Serbia

[4] Imperial Coll London, GLAM, London, England

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

autism spectrum conditions; vocal abnormalities; echolalia; convolutional recurrent neural network;

D O I：

10.21437/Interspeech.2018-1772

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Autism spectrum conditions (ASC) are a set of neuro-developmental conditions partly characterised by difficulties with communication. Individuals with ASC can show a variety of atypical speech behaviours, including echolalia or the 'echoing' of another's speech. We herein introduce a new dataset of 15 Serbian ASC children in a human-robot interaction scenario, annotated for the presence of echolalia amongst other ASC vocal behaviours. From this, we propose a four-class classification problem and investigate the suitability of applying a 2D convolutional neural network augmented with a recurrent neural network with bidirectional long short-term memory cells to solve the proposed task of echolalia recognition. In this approach, log Mel-spectrograms are first generated from the audio recordings and then fed as input into the convolutional layers to extract high-level spectral features. The subsequent recurrent layers are applied to learn the long-term temporal context from the obtained features. Finally, we use a feed forward neural network with softmax activation to classify the dataset. To evaluate the performance of our deep learning approach, we use leave-one subject-out cross-validation. Key results presented indicate the suitability of our approach by achieving a classification accuracy of 83.5 % unweighted average recall.

引用

页码：2334 / 2338

页数：5

共 38 条

[1] Convolutional Neural Networks for Speech Recognition [J].

Abdel-Hamid, Ossama ;

Mohamed, Abdel-Rahman ;

Jiang, Hui ;

Deng, Li ;

Penn, Gerald ;

Yu, Dong .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545

[2]

[Anonymous], 2013, ARXIV13041018

[3]

[Anonymous], 2015, ARXIV PREPRINT ARXIV

[4]

[Anonymous], INDIANA RESOURCE CTR

[5]

[Anonymous], 2014, ARXIV14021128CSSTAT

[6]

[Anonymous], 2018, P 31 INT JOINT C NEU

[7]

[Anonymous], ARXIV170206286

[8]

[Anonymous], 1997, Neural Computation

[9]

[Anonymous], 2012, AUTISM DIAGNOSIS OBS

[10]

[Anonymous], 2004, ACM SIGKDD EXPLOR NE, DOI DOI 10.1145/1007730.1007733

← 1 2 3 4 →