Impact of Word Classing on Shrinkage-Based Language Models

被引:0
作者
Sarikaya, Ruhi [1 ]
Chen, Stanley F. [1 ]
Sethy, Abhinav [1 ]
Ramabhadran, Bhuvana [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 | 2010年
关键词
word classing; exponential models; Model M;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper investigates the impact of word classing on a recently proposed shrinkage-based language model, Model M [5]. Model M, a class-based n-gram model, has been shown to significantly outperform word-based n-gram models on a variety of domains. In past work, word classes for Model M were induced automatically from unlabeled text using the algorithm of [2]. We take a closer look at the classing and attempt to find out whether improved classing would also translate to improved performance. In particular, we explore the use of manually-assigned classes, part-of-speech (POS) tags, and dialog state information, considering both hard classing and soft classing. In experiments with a conversational dialog system (human-machine dialog) and a speech-to-speech translation system (human-human dialog), we find that better classing can improve Model M performance by up to 3% absolute in word-error rate.
引用
收藏
页码:1804 / 1807
页数:4
相关论文
共 16 条
[1]  
Brown P. F., 1992, Computational Linguistics, V18, P467
[2]   Structured language modeling [J].
Chelba, C ;
Jelinek, F .
COMPUTER SPEECH AND LANGUAGE, 2000, 14 (04) :283-332
[3]  
Chen S., 1996, ACL
[4]  
Chen S. F., 2009, P HLT NAACL BOULD CO
[5]  
Chen S. F., 2009, P IEEE ASRU MER IT
[6]  
Chen S. F., 2009, 24829 IBM RC RES DIV
[7]  
Cui J., 2007, P 2007 IEEE AUT SPEE, P171
[8]  
Emami A., 2003, P IEEE INT C AC SPEE, V1, P372
[9]   Using semantic analysis to improve speech recognition performance [J].
Erdogan, H ;
Sarikaya, R ;
Chen, SF ;
Gao, YQ ;
Picheny, M .
COMPUTER SPEECH AND LANGUAGE, 2005, 19 (03) :321-343
[10]  
Galescu L., 1999, P EUR, P537