Language recognition with discriminative keyword selection

被引:25
作者
Richardson, F. S. [1 ]
Campbell, W. M. [1 ]
机构
[1] MIT, Lincoln Lab, Cambridge, MA 02139 USA
来源
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 | 2008年
关键词
language recognition; support vector machines;
D O I
10.1109/ICASSP.2008.4518567
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
One commonly used approach for language recognition is to convert the input speech into a sequence of tokens such as words or phones and then to use these token sequences to determine the target language. The language classification is typically performed by extracting N-gram statistics from the token sequences and then using an N-gram language model or support vector machine (SVM) to perform the classification. One problem with these approaches is that the number of N-grams grows exponentially as the order N is increased. This is especially problematic for an SVM classifier as each utterance is represented as a distinct N-gram vector. In this paper we propose a novel approach for modeling higher order N-grams using an SVM via an alternating filter-wrapper feature selection method. We demonstrate the effectiveness of this technique on the NIST 2007 language recognition task.
引用
收藏
页码:4145 / 4148
页数:4
相关论文
共 15 条
[1]  
[Anonymous], 2002, CAMBRIDGE U ENG DEP
[2]  
BIN M, 2005, 28 ANN INT ACM SIGIR
[3]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[4]  
CAMPBELL WM, 2004, ADV NEURAL INFORM PR, P16
[5]  
CAMPBELL WM, 2006, P IEEE OD
[6]  
CAMPBELL WM, 2007, P ICASSP, P989
[7]   SVMTorch: Support vector machines for large-scale regression problems [J].
Collobert, R ;
Bengio, S .
JOURNAL OF MACHINE LEARNING RESEARCH, 2001, 1 (02) :143-160
[8]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[9]  
Joachims T., 2002, Learning to classify text using support vector machines
[10]  
*LING DAT CONS, SWITCHB 2 CORP