Automatic measurement of voice onset time using discriminative structured prediction

被引:32
作者
Sonderegger, Morgan [1 ,2 ]
Keshet, Joseph [3 ]
机构
[1] Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA
[2] Univ Chicago, Dept Linguist, Chicago, IL 60637 USA
[3] Toyota Technol Inst, Chicago, IL 60637 USA
关键词
CROSS-LANGUAGE; INITIAL STOPS; SPEECH; FRENCH; WORD;
D O I
10.1121/1.4763995
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A discriminative large-margin algorithm for automatic measurement of voice onset time (VOT) is described, considered as a case of predicting structured output from speech. Manually labeled data are used to train a function that takes as input a speech segment of an arbitrary length containing a voiceless stop, and outputs its VOT. The function is explicitly trained to minimize the difference between predicted and manually measured VOT; it operates on a set of acoustic feature functions designed based on spectral and temporal cues used by human VOT annotators. The algorithm is applied to initial voiceless stops from four corpora, representing different types of speech. Using several evaluation methods, the algorithm's performance is near human intertranscriber reliability, and compares favorably with previous work. Furthermore, the algorithm's performance is minimally affected by training and testing on different corpora, and remains essentially constant as the amount of training data is reduced to 50-250 manually labeled examples, demonstrating the method's practical applicability to new datasets. (C) 2012 Acoustical Society of America. [http://dx.doi.org/10.1121/1.4763995]
引用
收藏
页码:3965 / 3979
页数:15
相关论文
共 39 条
[1]  
ALI AMA, 1999, THESIS U PENNSYLVANI
[2]  
[Anonymous], 1997, Switchboard-1 release 2
[3]  
Auzou P, 2000, CLIN LINGUIST PHONET, V14, P131
[4]   Mixed-effects modeling with crossed random effects for subjects and items [J].
Baayen, R. H. ;
Davidson, D. J. ;
Bates, D. M. .
JOURNAL OF MEMORY AND LANGUAGE, 2008, 59 (04) :390-412
[5]  
Bane M., 2012, P 46 CHIC L IN PRESS
[6]  
Bates D., 2011, R PACKAGE VERSION 09
[7]   AUTOMATIC SEGMENTATION AND LABELING OF SPEECH-BASED ON HIDDEN MARKOV-MODELS [J].
BRUGNARA, F ;
FALAVIGNA, D ;
OMOLOGO, M .
SPEECH COMMUNICATION, 1993, 12 (04) :357-370
[8]  
Caramazza A., 1974, J PHONETICS, V2, P239, DOI [DOI 10.1016/S0095-4470(19)31274-4, 10.1016/S0095-4470, DOI 10.1016/S0095-4470]
[9]   Variation and universals in VOT: evidence from 18 languages [J].
Cho, T ;
Ladefoged, P .
JOURNAL OF PHONETICS, 1999, 27 (02) :207-229
[10]  
Cooper A. M., 1991, Doctoral dissertation, Y