Language recognition with discriminative keyword selection

被引:25
作者
Richardson, F. S. [1 ]
Campbell, W. M. [1 ]
机构
[1] MIT, Lincoln Lab, Cambridge, MA 02139 USA
来源
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 | 2008年
关键词
language recognition; support vector machines;
D O I
10.1109/ICASSP.2008.4518567
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
One commonly used approach for language recognition is to convert the input speech into a sequence of tokens such as words or phones and then to use these token sequences to determine the target language. The language classification is typically performed by extracting N-gram statistics from the token sequences and then using an N-gram language model or support vector machine (SVM) to perform the classification. One problem with these approaches is that the number of N-grams grows exponentially as the order N is increased. This is especially problematic for an SVM classifier as each utterance is represented as a distinct N-gram vector. In this paper we propose a novel approach for modeling higher order N-grams using an SVM via an alternating filter-wrapper feature selection method. We demonstrate the effectiveness of this technique on the NIST 2007 language recognition task.
引用
收藏
页码:4145 / 4148
页数:4
相关论文
共 15 条
[11]  
Rabiner L., 1993, Fundamentals of Speech Recognition
[12]   Language models for detection of unknown attacks in network traffic [J].
Rieck, Konrad ;
Laskov, Pavel .
JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2007, 2 (04) :243-256
[13]  
Schwarz P, 2006, INT CONF ACOUST SPEE, P325
[14]  
ZHAI LF, 2006, P IEEE OD SPEAK LANG
[15]  
ICSI QUICKNET