An Automatically Aligned Corpus of Child-directed Speech

被引:2
|
作者
Elsner, Micha [1 ]
Ito, Kiwako [1 ]
机构
[1] Ohio State Univ, Dept Linguist, Columbus, OH 43210 USA
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
基金
美国国家科学基金会;
关键词
1.18 Special session: Data collection; transcription and annotation issues in child language acquisition settings; 1.11 L1 acquisition and bilingual acquisition; 8.8 Acoustic model adaptation; MOTHERS;
D O I
10.21437/Interspeeeh.2017-379
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Forced alignment would enable phonetic analyses of child directed speech (CDS) corpora which have existing transcriptions. But existing alignment systems arc inaccurate due to the atypical phonetics of CDS. We adapt a Kaldi forced alignment system to CDS by extending the dictionary and providing it with heuristically-derived hints for vowel locations. Using this system, we present a new time-aligned CDS corpus with a million aligned segments. We manually correct a subset of the corpus and demonstrate that our system is 70% accurate. Both our automatic and manually corrected alignments are publically available at osf. io/ke44q.
引用
收藏
页码:1736 / 1740
页数:5
相关论文
共 46 条