Montreal Forced Aligner: trainable text-speech alignment using Kaldi

被引:705
作者
McAuliffe, Michael [1 ]
Socolof, Michaela [2 ]
Mihuc, Sarah [1 ]
Wagner, Michael [1 ,3 ]
Sonderegger, Morgan [1 ,3 ]
机构
[1] McGill Univ, Dept Linguist, Montreal, PQ, Canada
[2] Univ Maryland, Dept Linguist, College Pk, MD 20742 USA
[3] McGill Univ, Ctr Res Brain Language & Mus, Montreal, PQ, Canada
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
基金
加拿大魁北克医学研究基金会;
关键词
forced alignment; automatic segmentation; acoustic analysis;
D O I
10.21437/Interspeech.2017-1386
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present the Montreal Forced Aligner (MFA), a new open source system for speech-text alienment. MFA is an update to the Prosodylab-Aligner, and maintains its key functionality of trainability on new data, as well as incorporating improved architecture (triphone acoustic models and speaker adaptation), and other features. MFA uses Kaldi instead of HTK, allowing MFA to be distributed as a stand-alone package, and to exploit parallel processing for computationally-intensive training and scaling to larger datasets. We evaluate MFA's performance on aligning word and phone boundaries in English conversational and laboratory speech. relative to human-annotated boundaries, focusing on the effects of aligner architecture and training on the data to be aligned. MFA performs well relative to two existing open-source aligners with simpler architecture (Prosodylab-Aligner and FAVE), and both its improved architecture and training on data to be aligned generally result in more accurate boundaries.
引用
收藏
页码:498 / 502
页数:5
相关论文
共 30 条
[1]   Quantifying temporal speech reduction in French using forced speech alignment [J].
Adda-Decker, Martine ;
Snoeren, Natalie D. .
JOURNAL OF PHONETICS, 2011, 39 (03) :261-270
[2]  
[Anonymous], 2002, GLOT INT
[3]  
[Anonymous], 2011, P IEEE WORKSH AUT SP
[4]  
[Anonymous], 2017, Gentle forced aligner
[5]  
Barnard E., 2014, P SLTU, P194
[6]  
Bigi B, 2012, LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P1748
[7]   Using automatic alignment to analyze endangered language data: Testing the viability of untrained alignment [J].
DiCanio, Christian ;
Nam, Hosung ;
Whalen, Douglas H. ;
Bunnell, H. Timothy ;
Amith, Jonathan D. ;
Castillo Garcia, Rey .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (03) :2235-2246
[8]  
Fromont R., 2012, P AUSTRALASIAN LANGU, P113
[9]   Why reduce? Phonological neighborhood density and phonetic reduction in spontaneous speech [J].
Gahl, Susanne ;
Yao, Yao ;
Johnson, Keith .
JOURNAL OF MEMORY AND LANGUAGE, 2012, 66 (04) :789-806
[10]  
Goldman J.-P., 2011, Proceedings of Interspeech 2011, Firenze, Italy, P3233