Symbolic-to-statistical hybridization: extending generation-heavy machine translation

被引:11
作者
Habash, Nizar [1 ]
Dorr, Bonnie [2 ]
Monz, Christof [3 ]
机构
[1] Columbia Univ, Ctr Computat Learning Syst, New York, NY 10027 USA
[2] Univ Maryland, Inst Adv Comp Studies, College Pk, MD 20742 USA
[3] Univ Amsterdam, Inst Informat, Amsterdam, Netherlands
关键词
Hybrid machine translation; Generation-heavy machine translation; Statistical machine translation; Arabic-English machine translation;
D O I
10.1007/s10590-009-9056-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The last few years have witnessed an increasing interest in hybridizing surface-based statistical approaches and rule-based symbolic approaches to machine translation (MT). Much of that work is focused on extending statistical MT systems with symbolic knowledge and components. In the brand of hybridization discussed here, we go in the opposite direction: adding statistical bilingual components to a symbolic system. Our base system is Generation-heavy machine translation (GHMT), a primarily symbolic asymmetrical approach that addresses the issue of Interlingual MT resource poverty in source-poor/target-rich language pairs by exploiting symbolic and statistical target-language resources. GHMT's statistical components are limited to target-language models, which arguably makes it a simple form of a hybrid system. We extend the hybrid nature of GHMT by adding statistical bilingual components. We also describe the details of retargeting it to Arabic-English MT. The morphological richness of Arabic brings several challenges to the hybridization task. We conduct an extensive evaluation of multiple system variants. Our evaluation shows that this new variant of GHMT-a primarily symbolic system extended with monolingual and bilingual statistical components-has a higher degree of grammaticality than a phrase-based statistical MT system, where grammaticality is measured in terms of correct verb-argument realization and long-distance dependency translation.
引用
收藏
页码:23 / 63
页数:41
相关论文
共 112 条
[1]  
Abdel-Monem A., 2019, PROPOSED APPROACH GE
[2]   Problems and solutions in machine translation involving Arabic, Chinese and French [J].
Alsharaf, H ;
Cardey, S ;
Greenfield, P ;
Shen, YH .
ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 2, PROCEEDINGS, 2004, :293-297
[3]  
Antworth EL, 1990, PC KIMMO 2 LEVEL PRO
[4]  
Ayan NF, 2004, LECT NOTES COMPUT SC, V3265, P17
[5]  
AYMERICH J, 2001, P 8 MACH TRANSL SUMM
[6]  
Banerjee Satanjeev, 2005, P ACL WORKSHOP INTRI, P65
[7]  
Bangalore S, 2000, 38TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P464
[8]  
BANGALORE S, 2000, P 18 INT C COMP LING, P42
[9]  
Beaven John, 1992, P 14 INT C COMP LING, P603
[10]  
Bikel DM, 2002, P 2 INT C HUM LANG T, P178