Improving example-based machine translation with statistical collocation model

被引:0
作者
机构
[1] School of Computer Science and Technology, Harbin Institute of Technology
[2] Baidu.com, Inc
来源
Liu, Z.-Y. (zhanyiliu@gmail.com) | 2012年 / Chinese Academy of Sciences卷 / 23期
关键词
Example selection; Example-based machine translation; Statistical collocation model; Translation selection;
D O I
10.3724/SP.J.1001.2012.04069
中图分类号
学科分类号
摘要
Example-Based machine translation (EBMT) uses a preprocessed bilingual corpus as a main translation knowledge. The final translation is generated by editing examples that match the input sentence. In the EBMT system, the performances of example selection and translation selection heavily influence the quality of the final translation. This paper proposes a method to improve the performance of the EBMT method by using statistical collocation model, which is estimated from monolingual corpora, in three aspects. First, the statistical collocation model is used to estimate the matching degree between the input sentence and examples to improve the performance of the example selection. Second, the performance of translation selection is improved by evaluating the collocation strength of the translation candidates and the context. Third, the collocated words of the translation candidates in the example are detected by the statistical collocation model and then the collocated words are corrected according to the context. In order to evaluate the proposed method, this study conducts a series of experiments. First, the study evaluates the proposed methods in a word-based EBMT system. As compared with the baseline, the methods achieves absolute improvements of 4.73~6.48 BLEU score on English-to-Chinese translation. Then, the study also applies the proposed translation selection method to a semi-structured EBMT system, and the translation qualities are further improved, with an improvement of 1.82 BLEU score. The results of human evaluation show that the translations generated by the improved semi-structured EBMT system can express the majority of the meaning of source sentences, and the fluency of theses translations can also be accepted. © 2012 ISCAS.
引用
收藏
页码:1472 / 1485
页数:13
相关论文
共 27 条
[1]  
Nagao M., A framework of a mechanical translation between Japanese and English by analogy principle, Proc. of the Int'l NATO Symp. on Artificial and Human Intelligence, pp. 173-180, (1984)
[2]  
Somers H., Review article: Example-Based machine translation, Machine Translation, 14, 2, pp. 113-157, (1999)
[3]  
Matsumoto Y., Ishimoto H., Utsuro T., Structural matching of parallel texts, Proc. of the 31st Annual Meeting of the Association for Computational Linguistics, pp. 23-30, (1993)
[4]  
Al-Adhaileh M.H., Tang E.K., Example-Based machine translation based on the synchronous SSTC annotation schema, Proc. of the Machine Translation Summit VII, pp. 244-249, (1999)
[5]  
Liu Z.Y., Wang H.F., Wu H., Example-Based machine translation based on tree-string correspondence and statistical generation, Machine Translation, 20, 1, pp. 25-41, (2006)
[6]  
Luk A.K., Statistical sense disambiguation with relatively small corpora using dictionary definitions, Proc. of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 181-188, (1995)
[7]  
Gale W., Church K., Yarowsky D., A method for disambiguation word senses in a large corpus, Computer and Humanities, 26, pp. 415-439, (1993)
[8]  
Towell G., Voorhees E.M., Disambiguating highly ambiguous words, Computational Linguistics, 24, 1, pp. 125-145, (1999)
[9]  
Yarowsky D., Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French, Proc. of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 88-95, (1994)
[10]  
Akiba Y., Watanabe T., Sumita E., Using language and translation models to select the best among outputs from multiple MT systems, Proc. of the 19th Int'l Conf. on Computational Linguistics, pp. 8-14, (2002)