Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks

被引:12
作者
Amiriparian, Shahin [1 ]
Baird, Alice [1 ]
Julka, Sahib [1 ]
Alcorn, Alyssa [2 ]
Ottl, Sandra [1 ]
Petrovic, Suncica [3 ]
Ainger, Eloise [2 ]
Cummins, Nicholas [1 ]
Schuller, Bjoern [1 ,4 ]
机构
[1] Univ Augsburg, Embedded Intelligence Hlth Care & Wellbeing, Augsburg, Germany
[2] UCL Inst Educ, Ctr Res Autism & Educ, London, England
[3] Serbian Soc Autism, Belgrade, Serbia
[4] Imperial Coll London, GLAM, London, England
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
关键词
autism spectrum conditions; vocal abnormalities; echolalia; convolutional recurrent neural network;
D O I
10.21437/Interspeech.2018-1772
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Autism spectrum conditions (ASC) are a set of neuro-developmental conditions partly characterised by difficulties with communication. Individuals with ASC can show a variety of atypical speech behaviours, including echolalia or the 'echoing' of another's speech. We herein introduce a new dataset of 15 Serbian ASC children in a human-robot interaction scenario, annotated for the presence of echolalia amongst other ASC vocal behaviours. From this, we propose a four-class classification problem and investigate the suitability of applying a 2D convolutional neural network augmented with a recurrent neural network with bidirectional long short-term memory cells to solve the proposed task of echolalia recognition. In this approach, log Mel-spectrograms are first generated from the audio recordings and then fed as input into the convolutional layers to extract high-level spectral features. The subsequent recurrent layers are applied to learn the long-term temporal context from the obtained features. Finally, we use a feed forward neural network with softmax activation to classify the dataset. To evaluate the performance of our deep learning approach, we use leave-one subject-out cross-validation. Key results presented indicate the suitability of our approach by achieving a classification accuracy of 83.5 % unweighted average recall.
引用
收藏
页码:2334 / 2338
页数:5
相关论文
共 38 条
[1]   Convolutional Neural Networks for Speech Recognition [J].
Abdel-Hamid, Ossama ;
Mohamed, Abdel-Rahman ;
Jiang, Hui ;
Deng, Li ;
Penn, Gerald ;
Yu, Dong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545
[2]  
[Anonymous], 2013, ARXIV13041018
[3]  
[Anonymous], 2015, ARXIV PREPRINT ARXIV
[4]  
[Anonymous], INDIANA RESOURCE CTR
[5]  
[Anonymous], 2014, ARXIV14021128CSSTAT
[6]  
[Anonymous], 2018, P 31 INT JOINT C NEU
[7]  
[Anonymous], ARXIV170206286
[8]  
[Anonymous], 1997, Neural Computation
[9]  
[Anonymous], 2012, AUTISM DIAGNOSIS OBS
[10]  
[Anonymous], 2004, ACM SIGKDD EXPLOR NE, DOI DOI 10.1145/1007730.1007733