A Decade of Discriminative Language Modeling for Automatic Speech Recognition

被引:0
作者
Saraclar, Murat [1 ]
Dikici, Erinc [1 ]
Arisoy, Ebru [2 ]
机构
[1] Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey
[2] MEF Univ, Dept Elect & Elect Engn, TR-34396 Istanbul, Turkey
来源
SPEECH AND COMPUTER (SPECOM 2015) | 2015年 / 9319卷
关键词
Automatic speech recognition; Discriminative training; Language modeling; RERANKING; RANKING;
D O I
10.1007/978-3-319-23132-7_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper summarizes the research on discriminative language modeling focusing on its application to automatic speech recognition (ASR). A discriminative language model (DLM) is typically a linear or log-linear model consisting of a weight vector associated with a feature vector representation of a sentence. This flexible representation can include linguistically and statistically motivated features that incorporate morphological and syntactic information. At test time, DLMs are used to rerank the output of an ASR system, represented as an N-best list or lattice. During training, both negative and positive examples are used with the aim of directly optimizing the error rate. Various machine learning methods, including the structured perceptron, large margin methods and maximum regularized conditional log-likelihood, have been used for estimating the parameters of DLMs. Typically positive examples for DLM training come from the manual transcriptions of acoustic data while the negative examples are obtained by processing the same acoustic data with an ASR system. Recent research generalizes DLM training by either using automatic transcriptions for the positive examples or simulating the negative examples.
引用
收藏
页码:11 / 22
页数:12
相关论文
共 42 条
[1]  
[Anonymous], 2005, P 43 ANN M ASS COMP
[2]  
[Anonymous], TECHNICAL REPORT
[3]  
Arisoy E., 2011, P INT FLOR IT
[4]   Discriminative Language Modeling With Linguistic and Statistically Derived Features [J].
Arisoy, Ebru ;
Saraclar, Murat ;
Roark, Brian ;
Shafran, Izhak .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02) :540-550
[5]   Turkish Broadcast News Transcription and Retrieval [J].
Arisoy, Ebru ;
Can, Dogan ;
Parlak, Siddika ;
Sak, Hasim ;
Saraclar, Murat .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (05) :874-883
[6]  
Bergsma S., 2010, P CONLL, P172
[7]  
Çelebi A, 2012, INT CONF ACOUST SPEE, P5025, DOI 10.1109/ICASSP.2012.6289049
[8]   Structured language modeling [J].
Chelba, C ;
Jelinek, F .
COMPUTER SPEECH AND LANGUAGE, 2000, 14 (04) :283-332
[9]  
Cherry Colin, 2008, P 8 C ASS MACH TRANS, P65
[10]  
Collins M, 2002, PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P1