Using automatic alignment to analyze endangered language data: Testing the viability of untrained alignment

被引:42
作者
DiCanio, Christian [1 ]
Nam, Hosung [1 ]
Whalen, Douglas H. [1 ]
Bunnell, H. Timothy [2 ]
Amith, Jonathan D. [3 ]
Castillo Garcia, Rey [4 ]
机构
[1] Haskins Labs Inc, New Haven, CT 06511 USA
[2] Nemours Biomed Res, Ctr Pediat Auditory & Speech Sci, Wilmington, DE 19803 USA
[3] Gettysburg Coll, Dept Anthropol, Gettysburg, PA 17325 USA
[4] Secretaria Educ Publ, Chilpancingo 39090, Guerrero, Mexico
基金
美国国家科学基金会;
关键词
SPEECH; PERCEPTION;
D O I
10.1121/1.4816491
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While efforts to document endangered languages have steadily increased, the phonetic analysis of endangered language data remains a challenge. The transcription of large documentation corpora is, by itself, a tremendous feat. Yet, the process of segmentation remains a bottleneck for research with data of this kind. This paper examines whether a speech processing tool, forced alignment, can facilitate the segmentation task for small data sets, even when the target language differs from the training language. The authors also examined whether a phone set with contextualization outperforms a more general one. The accuracy of two forced aligners trained on English (HMALIGN and P2FA) was assessed using corpus data from Yoloxochitl Mixtec. Overall, agreement performance was relatively good, with accuracy at 70.9% within 30 ms for HMALIGN and 65.7% within 30 ms for P2FA. Segmental and tonal categories influenced accuracy as well. For instance, additional stop allophones in HMALIGN's phone set aided alignment accuracy. Agreement differences between aligners also corresponded closely with the types of data on which the aligners were trained. Overall, using existing alignment systems was found to have potential for making phonetic analysis of small corpora more efficient, with more allophonic phone sets providing better agreement than general ones. (C) 2013 Acoustical Society of America.
引用
收藏
页码:2235 / 2246
页数:12
相关论文
共 49 条
[1]   Quantifying temporal speech reduction in French using forced speech alignment [J].
Adda-Decker, Martine ;
Snoeren, Natalie D. .
JOURNAL OF PHONETICS, 2011, 39 (03) :261-270
[2]  
[Anonymous], 2012, R LANG ENV STAT COMP
[3]  
[Anonymous], 1993, PRENTICE HALL SIGNAL
[4]  
[Anonymous], 2005, R NEWS
[5]  
Baayen R.H., 2008, ANALYSING LINGUISTIC, DOI DOI 10.1017/CBO9780511801686
[6]   Collecting and evaluating speech recognition corpora for 11 South African languages [J].
Badenhorst, Jaco ;
van Heerden, Charl ;
Davel, Marelie ;
Barnard, Etienne .
LANGUAGE RESOURCES AND EVALUATION, 2011, 45 (03) :289-309
[7]   Perception of coarticulatory nasalization by speakers of English and Thai: Evidence for partial compensation [J].
Beddor, PS ;
Krakow, RA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (05) :2868-2887
[8]   Free prefix ordering in Chintang [J].
Bickel, Balthasar ;
Banjade, Goma ;
Gaenszle, Martin ;
Lieven, Elena ;
Paudyal, Netra Prasad ;
Rai, Ichchha Purna ;
Rai, Manoi ;
Rai, Novel Kishore ;
Stoll, Sabine .
LANGUAGE, 2007, 83 (01) :43-73
[9]  
Boersma P., 2012, PRAAT DOING PHONETIC
[10]  
Boula de Mareuil P., 1999, P 14 INT C PHON SCI, P1209