Montreal Forced Aligner: trainable text-speech alignment using Kaldi

被引：705

作者：

McAuliffe, Michael ^{[1
]}

Socolof, Michaela ^{[2
]}

Mihuc, Sarah ^{[1
]}

Wagner, Michael ^{[1
,3
]}

Sonderegger, Morgan ^{[1
,3
]}

机构：

[1] McGill Univ, Dept Linguist, Montreal, PQ, Canada

[2] Univ Maryland, Dept Linguist, College Pk, MD 20742 USA

[3] McGill Univ, Ctr Res Brain Language & Mus, Montreal, PQ, Canada

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

基金：

加拿大魁北克医学研究基金会;

关键词：

forced alignment; automatic segmentation; acoustic analysis;

D O I：

10.21437/Interspeech.2017-1386

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present the Montreal Forced Aligner (MFA), a new open source system for speech-text alienment. MFA is an update to the Prosodylab-Aligner, and maintains its key functionality of trainability on new data, as well as incorporating improved architecture (triphone acoustic models and speaker adaptation), and other features. MFA uses Kaldi instead of HTK, allowing MFA to be distributed as a stand-alone package, and to exploit parallel processing for computationally-intensive training and scaling to larger datasets. We evaluate MFA's performance on aligning word and phone boundaries in English conversational and laboratory speech. relative to human-annotated boundaries, focusing on the effects of aligner architecture and training on the data to be aligned. MFA performs well relative to two existing open-source aligners with simpler architecture (Prosodylab-Aligner and FAVE), and both its improved architecture and training on data to be aligned generally result in more accurate boundaries.

引用

页码：498 / 502

页数：5

共 30 条

[1] Quantifying temporal speech reduction in French using forced speech alignment [J].