Lexical Resources for Hindi Marathi MT

被引:0
作者
Sreelekha, S. [1 ]
Bhattacharyya, Pushpak [1 ]
Malathi, D. [1 ]
机构
[1] SRM Univ, Indian Inst Technol Bombay, Dept Comp Sci & Engn, Bombay, Maharashtra, India
来源
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年
关键词
Statistical Machine Translation; IndoWordnet; Lexical Resources;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
In this paper we describe ways of utilizing lexical resources to improve the quality of statistical machine translation. We have augmented the training corpus with various lexical resources such as IndoWordnet semantic relation set, function words, kridanta pairs and verb phrases. We augmented parallel corpora in two ways (a) additional vocabulary and (b) inflected word forms. We have described case studies, evaluations and have given detailed error analysis for both Marathi to Hindi and Hindi to Marathi machine translation systems. From the evaluations we observed an order of magnitude improvement in translation quality. Lexical resources do help uplift performance when parallel corpora is scanty.
引用
收藏
页数:9
相关论文
共 18 条
[1]  
Agarwal A., 2008, P 3 WORKSH STAT MACH, P115
[2]  
Ahsan Arafat, 2010, COUPLING STAT MACHIN
[3]  
Antony P. J., 2013, COMPUT LINGUIST, V18, P47
[4]  
Bhattacharyya Pushpak, 2010, LREC 2010
[5]  
Bhattacharyya Pushpak, 2008, GLOB WORDN C 2008
[6]  
Bhosale Ganesh, 2011, ICON 2011 CHENN DEC
[7]  
Brown P. F., 1993, The mathematics of statistical machine translation: parameter estimation.
[8]  
Dixit Veena, 2005, ARCH CONTROL SCI, V15(LI), P251
[9]  
Dorr Bonnie J., 1994, COMPUTATIONAL LINGUI
[10]  
Knight Kevin, 1999, COMPUTATIONAL LINGUI