Classifying Autism From Crowdsourced Semistructured Speech Recordings: Machine Learning Model Comparison Study

被引:32
作者
Chi, Nathan A. [1 ]
Washington, Peter [2 ]
Kline, Aaron [1 ]
Husic, Arman [1 ]
Hou, Cathy [3 ]
He, Chloe [4 ]
Dunlap, Kaitlyn [1 ]
Wall, Dennis P. [1 ,4 ,5 ]
机构
[1] Stanford Univ, Dept Pediat, Div Syst Med, 3145 Porter Dr, Palo Alto, CA 94304 USA
[2] Stanford Univ, Dept Bioengn, Stanford, CA USA
[3] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Biomed Data Sci, Stanford, CA USA
[5] Stanford Univ, Dept Psychiat & Behav Sci, Stanford, CA USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
autism; mHealth; machine learning; artificial intelligence; speech; audio; child; digital data; mobile app; diagnosis; SPECTRUM DISORDER; CHILDREN; INTERVENTION; PROSODY; HEALTH;
D O I
10.2196/35406
中图分类号
R72 [儿科学];
学科分类号
100202 ;
摘要
Background: Autism spectrum disorder (ASD) is a neurodevelopmental disorder that results in altered behavior, social development, and communication patterns. In recent years, autism prevalence has tripled, with 1 in 44 children now affected. Given that traditional diagnosis is a lengthy, labor-intensive process that requires the work of trained physicians, significant attention has been given to developing systems that automatically detect autism. We work toward this goal by analyzing audio data, as prosody abnormalities are a signal of autism, with affected children displaying speech idiosyncrasies such as echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns. Objective: We aimed to test the ability for machine learning approaches to aid in detection of autism in self-recorded speech audio captured from children with ASD and neurotypical (NT) children in their home environments. Methods: We considered three methods to detect autism in child speech: (1) random forests trained on extracted audio features (including Mel-frequency cepstral coefficients); (2) convolutional neural networks trained on spectrograms; and (3) fine-tuned wav2vec 2.0-a state-of-the-art transformer-based speech recognition model. We trained our classifiers on our novel data set of cellphone-recorded child speech audio curated from the Guess What? mobile game, an app designed to crowdsource videos of children with ASD and NT children in a natural home environment. Results: The random forest classifier achieved 70% accuracy, the fine-tuned wav2vec 2.0 model achieved 77% accuracy, and the convolutional neural network achieved 79% accuracy when classifying children's audio as either ASD or NT. We used 5-fold cross-validation to evaluate model performance. Conclusions: Our models were able to predict autism status when trained on a varied selection of home audio clips with inconsistent recording qualities, which may be more representative of real-world conditions. The results demonstrate that machine learning methods offer promise in detecting autism automatically from speech without specialized equipment.
引用
收藏
页数:11
相关论文
共 64 条
[1]  
Alshaigi Kahlid, 2020, Int J Pediatr Adolesc Med, V7, P140, DOI [10.1016/j.ijpam.2019.06.003, 10.1016/j.ijpam.2019.06.003]
[2]  
Baevski Alexei, 2020, 34 C NEURAL INFORM P
[3]   Prevalence of Autism Spectrum Disorder Among Children Aged 8 Years - Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2014 [J].
Baio, Jon ;
Wiggins, Lisa ;
Christensen, Deborah L. ;
Maenner, Matthew J. ;
Daniels, Julie ;
Warren, Zachary ;
Kurzius-Spencer, Margaret ;
Zahorodny, Walter ;
Rosenberg, Cordelia Robinson ;
White, Tiffany ;
Durkin, Maureen S. ;
Imm, Pamela ;
Nikolaou, Loizos ;
Yeargin-Allsopp, Marshalyn ;
Lee, Li-Ching ;
Harrington, Rebecca ;
Lopez, Maya ;
Fitzgerald, Robert T. ;
Hewitt, Amy ;
Pettygrove, Sydney ;
Constantino, John N. ;
Vehorn, Alison ;
Shenouda, Josephine ;
Hall-Lande, Jennifer ;
Braun, Kim Van Naarden ;
Dowling, Nicole F. .
MMWR SURVEILLANCE SUMMARIES, 2018, 67 (06) :1-23
[4]  
Banerjee A, 2021, preprint)
[5]   The Experiences of Late-diagnosed Women with Autism Spectrum Conditions: An Investigation of the Female Autism Phenotype [J].
Bargiela, Sarah ;
Steward, Robyn ;
Mandy, William .
JOURNAL OF AUTISM AND DEVELOPMENTAL DISORDERS, 2016, 46 (10) :3281-3294
[6]  
Cho S, 2019, INTERSPEECH
[7]   Detection of eye contact with deep neural networks is as accurate as human experts [J].
Chong, Eunji ;
Clark-Whitney, Elysha ;
Southerland, Audrey ;
Stubbs, Elizabeth ;
Miller, Chanel ;
Ajodan, Eliana L. ;
Silverman, Melanie R. ;
Lord, Catherine ;
Rozga, Agata ;
Jones, Rebecca M. ;
Rehg, James M. .
NATURE COMMUNICATIONS, 2020, 11 (01)
[8]   Use of Machine Learning to Identify Children with Autism and Their Motor Abnormalities [J].
Crippa, Alessandro ;
Salvatore, Christian ;
Perego, Paolo ;
Forti, Sara ;
Nobile, Maria ;
Molteni, Massimo ;
Castiglioni, Isabella .
JOURNAL OF AUTISM AND DEVELOPMENTAL DISORDERS, 2015, 45 (07) :2146-2156
[9]   DESIGN AND EFFICACY OF A WEARABLE DEVICE FOR SOCIAL AFFECTIVE LEARNING IN CHILDREN WITH AUTISM [J].
Daniels, Jena ;
Schwartz, Jessey ;
Haber, Nick ;
Voss, Catalin ;
Kline, Aaron ;
Fazel, Azar ;
Washington, Peter ;
De, Titas ;
Feinstein, Carl ;
Winograd, Terry ;
Wall, Dennis .
JOURNAL OF THE AMERICAN ACADEMY OF CHILD AND ADOLESCENT PSYCHIATRY, 2017, 56 (10) :S257-S257
[10]   Exploratory study examining the at-home feasibility of a wearable tool for social-affective learning in children with autism [J].
Daniels, Jena ;
Schwartz, Jessey N. ;
Voss, Catalin ;
Haber, Nick ;
Fazel, Azar ;
Kline, Aaron ;
Washington, Peter ;
Feinstein, Carl ;
Winograd, Terry ;
Wall, Dennis P. .
NPJ DIGITAL MEDICINE, 2018, 1