A deep source-context feature for lexical selection in statistical machine translation

被引:6
|
作者
Gupta, Parth [1 ]
Costa-Jussa, Marta R. [2 ,3 ]
Rosso, Paolo [1 ]
Banchs, Rafael E. [4 ]
机构
[1] Univ Politecn Valencia, PRHLT Res Ctr, Camino Vera S-N, E-46022 Valencia, Spain
[2] Univ Politecn Cataluna, TALP Res Ctr, ES-08034 Barcelona, Spain
[3] Inst Politecn Nacl, Ctr Invest Comp, Av San Juan Dios Batiz, Mexico City 07738, DF, Mexico
[4] Inst Infocomm Res, Human Language Technol, 1 Fusionopolis Way, Singapore 138632, Singapore
关键词
Natural language processing; Neural nets and related approaches; Semantics;
D O I
10.1016/j.patrec.2016.02.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a methodology to address lexical disambiguation in a standard phrase-based statistical machine translation system. Similarity among source contexts is used to select appropriate translation units. The information is introduced as a novel feature of the phrase-based model and it is used to select the translation units extracted from the training sentence more similar to the sentence to translate. The similarity is computed through a deep autoencoder representation, which allows to obtain effective low-dimensional embedding of data and statistically significant BLEU score improvements on two different tasks (English-to-Spanish and English-to-Hindi). (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:24 / 29
页数:6
相关论文
共 2 条
  • [1] Context Sensitive Word Deletion Model for Statistical Machine Translation
    Li, Qiang
    Han, Yaqian
    Xiao, Tong
    Zhu, Jingbo
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 73 - 84
  • [2] Routing Based Context Selection for Document-Level Neural Machine Translation
    Fei, Weilun
    Jian, Ping
    Zhu, Xiaoguang
    Lin, Yi
    MACHINE TRANSLATION, CCMT 2021, 2021, 1464 : 77 - 91