Factored bilingual n-gram language models for statistical machine translation

被引:3
|
作者
Crego, Josep M. [1 ]
Yvon, Francois [1 ,2 ]
机构
[1] LIMSI CNRS, BP 133, F-91430 Orsay, France
[2] Univ Paris 11, F-91430 Orsay, France
关键词
Statistical machine translation; Bilingual n-gram language models; Factored language models;
D O I
10.1007/s10590-010-9082-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we present an extension of n-gram-based translation models based on factored language models (FLMs). Translation units employed in the n-gram-based approach to statistical machine translation (SMT) are based on mappings of sequences of raw words, while translation model probabilities are estimated through standard language modeling of such bilingual units. Therefore, similar to other translation model approaches (phrase-based or hierarchical), the sparseness problem of the units being modeled leads to unreliable probability estimates, even under conditions where large bilingual corpora are available. In order to tackle this problem, we extend the n-gram-based approach to SMT by tightly integrating more general word representations, such as lemmas and morphological classes, and we use the flexible framework of FLMs to apply a number of different back-off techniques. In this work, we show that FLMs can also be successfully applied to translation modeling, yielding more robust probability estimates that integrate larger bilingual contexts during the translation process.
引用
收藏
页码:159 / 175
页数:17
相关论文
共 50 条
  • [31] Integration of Bilingual Lists for Domain-Specific Statistical Machine Translation for Sinhala-Tamil
    Farhath, Fathima
    Ranathunga, Surangika
    Jayasena, Sanath
    Dias, Gihan
    2018 MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON) 4TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2018, : 538 - 543
  • [32] Extracting Bilingual Multi-word Expressions for Low-resource Statistical Machine Translation
    Wei, Linyu
    Li, Miao
    Chen, Lei
    Yang, Zhenxin
    Sun, Kai
    Yuan, Man
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 21 - 24
  • [33] Graph-based Lexicalized Reordering Models for Statistical Machine Translation
    Su Jinsong
    Liu Yang
    Liu Qun
    Dong Huailin
    CHINA COMMUNICATIONS, 2014, 11 (05) : 71 - 82
  • [34] Improving Reordering Models with Phrase Number Feature for Statistical Machine Translation
    Noormohammadi, Neda
    Rahimi, Zahra
    Khadivi, Shahram
    ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING, AISP 2013, 2014, 427 : 227 - 233
  • [35] Seal: Efficient Training Large Scale Statistical Machine Translation Models on Spark
    Gu, Rong
    Chen, Min
    Yang, Wenjia
    Yuan, Chunfeng
    Huang, Yihua
    2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 118 - 125
  • [36] A comparison of segmentation methods and extended lexicon models for Arabic statistical machine translation
    Hasan, Sasa
    Mansour, Saab
    Ney, Hermann
    MACHINE TRANSLATION, 2012, 26 (1-2) : 47 - 65
  • [37] STATISTICAL VERSUS NEURAL MACHINE TRANSLATION - A CASE STUDY FOR A MEDIUM SIZE DOMAIN-SPECIFIC BILINGUAL CORPUS
    Jassem, Krzysztof
    Dwojak, Tomasz
    POZNAN STUDIES IN CONTEMPORARY LINGUISTICS, 2019, 55 (02) : 491 - 519
  • [38] Syntax-Based Chinese-Vietnamese Tree-to-Tree Statistical Machine Translation with Bilingual Features
    Gao, Shengxiang
    Huang, Jihao
    Xue, Mingya
    Yu, Zhengtao
    Wang, Zhuo
    Zhang, Yang
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (04)
  • [39] Integrating source-language context into phrase-based statistical machine translation
    Haque, Rejwanul
    Naskar, Sudip Kumar
    van den Bosch, Antal
    Way, Andy
    MACHINE TRANSLATION, 2011, 25 (03) : 239 - 285
  • [40] Reduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language
    Maucec, Mirjam Sepesy
    Brest, Janez
    INFORMATICA, 2010, 21 (01) : 95 - 116