Improving ASR Systems for Children with Autism and Language Impairment Using Domain-Focused DNN Transfer Techniques

被引:25
作者
Gale, Robert [1 ]
Chen, Liu [1 ]
Dolata, Jill [1 ]
van Santen, Jan [1 ]
Asgari, Meysam [1 ]
机构
[1] Oregon Hlth & Sci Univ OHSU, Ctr Spoken Language Understanding CSLU, Portland, OR 97239 USA
来源
INTERSPEECH 2019 | 2019年
关键词
speech recognition; children speech recognition; autism spectrum disorder; language impairment; deep neural network; transfer learning; SPEECH; DATABASE;
D O I
10.21437/Interspeech.2019-3161
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This study explores building and improving an automatic speech recognition (ASR) system for children aged 6-9 years and diagnosed with autism spectrum disorder (ASD), language impairment (LI), or both. Working with only 1.5 hours of target data in which children perform the Clinical Evaluation of Language Fundamentals Recalling Sentences task, we apply deep neural network (DNN) weight transfer techniques to adapt a large DNN model trained on the LibriSpeech corpus of adult speech. To begin, we aim to find the best proportional training rates of the DNN layers. Our best configuration yields a 29.38% word error rate (WER). Using this configuration, we explore the effects of quantity and similarity of data augmentation in transfer learning. We augment our training with portions of the OGI Kids' Corpus, adding 4.6 hours of typically developing speakers aged kindergarten through 3(rd) grade. We find that 2(nd) grade data alone - approximately the mean age of the target data - outperforms other grades and all the sets combined. Doubling the data for 1(st), 2(nd), and 3(rd) grade, we again compare each grade as well as pairs of grades. We find the combination of 1(st) and 2(nd) grade performs best at a 26.21% WER.
引用
收藏
页码:11 / 15
页数:5
相关论文
共 20 条
[1]  
[Anonymous], 2014, SLTU
[2]  
[Anonymous], 2010, ADV SPEECH RECOGNITI
[3]   Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].
Gauvain, Jean-Luc ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298
[4]   Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-Tuning [J].
Ge, Weifeng ;
Yu, Yizhou .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :10-19
[5]  
Ghahremani P, 2017, 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), P279, DOI 10.1109/ASRU.2017.8268947
[6]  
Heigold G, 2013, INT CONF ACOUST SPEE, P8619, DOI 10.1109/ICASSP.2013.6639348
[7]  
Kiss G, 2012, 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, P1342
[8]   Severity-Based Adaptation with Limited Data for ASR to Aid Dysarthric Speakers [J].
Mustafa, Mumtaz Begum ;
Salim, Siti Salwah ;
Mohamed, Noraini ;
Al-Qatab, Bassam ;
Siong, Chng Eng .
PLOS ONE, 2014, 9 (01)
[9]  
Panayotov V, 2015, INT CONF ACOUST SPEE, P5206, DOI 10.1109/ICASSP.2015.7178964
[10]   The Clinical Evaluation of Language Fundamentals, Fourth Edition (CELF-4) A Review [J].
Paslawski, Teresa .
CANADIAN JOURNAL OF SCHOOL PSYCHOLOGY, 2005, 20 (1-2) :129-134