FRAME-BASED PHONOTACTIC LANGUAGE IDENTIFICATION

被引:0
作者
Han, Kyu [1 ]
Pelecanos, Jason [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012) | 2012年
关键词
DARPA RATS; language identification; phonotactic; phone event modeling with timing information;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes a frame-based phonotactic Language Identification (LID) system, which was used for the LID evaluation of the Robust Automatic Transcription of Speech (RATS) program by the Defense Advanced Research Projects Agency (DARPA). The proposed approach utilizes features derived from frame-level phone log-likelihoods from a phone recognizer. It is an attempt to capture not only phone sequence information but also short-term timing information for phone N-gram events, which is lacking in conventional phonotactic LID systems that simply count phone N-gram events. Based on this new method, we achieved 26% relative improvement in terms of C-avg for the RATS LID evaluation data compared to phone N-gram counts modeling. We also observed that it had a significant impact on score combination with our best acoustic system based on Mel-Frequency Cepstral Coefficients (MFCCs).
引用
收藏
页码:303 / 306
页数:4
相关论文
共 14 条
[1]  
Campbell W. M., 2006, P ICASSP MAY 14 19
[2]  
Campbell WM, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P73
[3]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[4]  
D'Haro L. F., 2012, P INT SEPT 9 13
[5]  
Gauvain J. L., 2004, P ICSLP, P1283
[6]  
Mikolov T, 2010, ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, P251
[7]   Speaker verification using adapted Gaussian mixture models [J].
Reynolds, DA ;
Quatieri, TF ;
Dunn, RB .
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :19-41
[8]  
Singer E., 2003, P INTERSPEECH, P1345
[9]  
Stolcke A., 2002, P INTERSPEECH, P901
[10]  
Torres-Carrasquillo P., 2002, ICSLP, P89