SVM-Based Detection of Misannotated Words in Read Speech Corpora

被引:0
作者
Matousek, Jindrich [1 ]
Tihelka, Daniel [1 ]
机构
[1] Univ W Bohemia, Fac Sci Appl, Dept Cybernet, Univ 8, Plzen 30614, Czech Republic
来源
TEXT, SPEECH, AND DIALOGUE, TSD 2013 | 2013年 / 8082卷
关键词
annotation error detection; classification; support vector machine; read speech corpora; ANNOTATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic detection of misannotated words in single-speaker read-speech corpora is investigated in this paper. Support vector machine (SVM) classifier was proposed to detect the misannotated words. Its performance was evaluated with respect to various word-level feature sets. The SVM classifier was shown to perform very well with both high precision and recall scores and with F1 measure being almost 88%. This is a statistically significant improvement over a traditionally used outlier-based detection method.
引用
收藏
页码:457 / 464
页数:8
相关论文
共 21 条
[1]  
Adell J, 2006, INT CONF ACOUST SPEE, P889
[2]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[3]   Approximate statistical tests for comparing supervised classification learning algorithms [J].
Dietterich, TG .
NEURAL COMPUTATION, 1998, 10 (07) :1895-1923
[4]   Using Morphological Information for Robust Language Modeling in Czech ASR System [J].
Ircing, Pavel ;
Psutka, Josef V. ;
Psutka, Josef .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04) :840-847
[5]  
Kominek J., 2004, P 5 ISCA WORKSH SPEE, P155
[6]  
Lu H, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P162
[7]  
Matousek J, 2003, LECT NOTES ARTIF INT, V2807, P287
[8]  
Matousek J, 2007, LECT NOTES ARTIF INT, V4629, P326
[9]  
Matousek J, 2013, INTERSPEECH, P1511
[10]  
Matousek J, 2012, LECT NOTES COMPUT SC, V7499, P456, DOI 10.1007/978-3-642-32790-2_55