Factored bilingual n-gram language models for statistical machine translation

被引:3
|
作者
Crego, Josep M. [1 ]
Yvon, Francois [1 ,2 ]
机构
[1] LIMSI CNRS, BP 133, F-91430 Orsay, France
[2] Univ Paris 11, F-91430 Orsay, France
关键词
Statistical machine translation; Bilingual n-gram language models; Factored language models;
D O I
10.1007/s10590-010-9082-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we present an extension of n-gram-based translation models based on factored language models (FLMs). Translation units employed in the n-gram-based approach to statistical machine translation (SMT) are based on mappings of sequences of raw words, while translation model probabilities are estimated through standard language modeling of such bilingual units. Therefore, similar to other translation model approaches (phrase-based or hierarchical), the sparseness problem of the units being modeled leads to unreliable probability estimates, even under conditions where large bilingual corpora are available. In order to tackle this problem, we extend the n-gram-based approach to SMT by tightly integrating more general word representations, such as lemmas and morphological classes, and we use the flexible framework of FLMs to apply a number of different back-off techniques. In this work, we show that FLMs can also be successfully applied to translation modeling, yielding more robust probability estimates that integrate larger bilingual contexts during the translation process.
引用
收藏
页码:159 / 175
页数:17
相关论文
共 50 条
  • [1] An Approach to N-Gram Language Model Evaluation in Phrase-Based Statistical Machine Translation
    Su, Jinsong
    Liu, Qun
    Dong, Huailin
    Chen, Yidong
    Shi, Xiaodong
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 201 - 204
  • [2] N-gram posterior probability confidence measures for statistical machine translation: an empirical study
    de Gispert, Adria
    Blackwood, Graeme
    Iglesias, Gonzalo
    Byrne, William
    MACHINE TRANSLATION, 2013, 27 (02) : 85 - 114
  • [3] Factored Statistical Machine Translation System for English to Tamil Language
    Anand, Kumar M.
    Dhanalakshmi
    Soman, K. P.
    Rajendran, S.
    PERTANIKA JOURNAL OF SOCIAL SCIENCE AND HUMANITIES, 2014, 22 (04): : 1045 - 1061
  • [4] Bilingual cluster based models for statistical machine translation
    Yamamoto, Hirofumi
    Sumita, Eiichiro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03) : 588 - 597
  • [5] Spatial Ontology in Factored Statistical Machine Translation
    Skadins, Raivis
    DATABASES AND INFORMATION SYSTEMS VI: SELECTED PAPERS FROM THE NINTH INTERNATIONAL BALTIC CONFERENCE (DB&IS 2010), 2011, 224 : 153 - 166
  • [6] Bilingual phrases for statistical machine translation
    Garcia-Varea, I.
    Nevado, F.
    Ortiz, D.
    Tomas, J.
    Casacuberta, F.
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (35): : 93 - 100
  • [7] An Investigation on Statistical Machine Translation with Neural Language Models
    Zhao, Yinggong
    Huang, Shujian
    Chen, Huadong
    Chen, Jiajun
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2014, 2014, 8801 : 175 - 186
  • [8] Bilingual chunk alignment in statistical machine translation
    Zhou, Y
    Zong, CQ
    Xu, B
    2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 1401 - 1406
  • [9] Discriminative Spoken Language Understanding Using Statistical Machine Translation Alignment Models
    Aliannejadi, Mohammad
    Khadivi, Shahram
    Ghidary, Saeed Shiry
    Bokaei, Mohammad Hadi
    ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING, AISP 2013, 2014, 427 : 194 - +
  • [10] BILINGUAL RECURRENT NEURAL NETWORKS FOR IMPROVED STATISTICAL MACHINE TRANSLATION
    Zhao, Bing
    Tam, Yik-Cheung
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 66 - 70