Factored bilingual n-gram language models for statistical machine translation

被引:3
|
作者
Crego, Josep M. [1 ]
Yvon, Francois [1 ,2 ]
机构
[1] LIMSI CNRS, BP 133, F-91430 Orsay, France
[2] Univ Paris 11, F-91430 Orsay, France
关键词
Statistical machine translation; Bilingual n-gram language models; Factored language models;
D O I
10.1007/s10590-010-9082-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we present an extension of n-gram-based translation models based on factored language models (FLMs). Translation units employed in the n-gram-based approach to statistical machine translation (SMT) are based on mappings of sequences of raw words, while translation model probabilities are estimated through standard language modeling of such bilingual units. Therefore, similar to other translation model approaches (phrase-based or hierarchical), the sparseness problem of the units being modeled leads to unreliable probability estimates, even under conditions where large bilingual corpora are available. In order to tackle this problem, we extend the n-gram-based approach to SMT by tightly integrating more general word representations, such as lemmas and morphological classes, and we use the flexible framework of FLMs to apply a number of different back-off techniques. In this work, we show that FLMs can also be successfully applied to translation modeling, yielding more robust probability estimates that integrate larger bilingual contexts during the translation process.
引用
收藏
页码:159 / 175
页数:17
相关论文
共 50 条
  • [41] Data Categorization and Model Weighting Approach for Language Model Adaptation in Statistical Machine Translation
    AbuHamad, Mohammed
    Mohd, Masnizah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (01) : 135 - 141
  • [42] NAME-AWARE LANGUAGE MODEL ADAPTATION AND SPARSE FEATURES FOR STATISTICAL MACHINE TRANSLATION
    Wang, Wen
    Li, Haibo
    Ji, Heng
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 324 - 330
  • [43] Reranking machine translation hypotheses with structured and web-based language models
    Wang, Wen
    Stolcke, Andreas
    Zheng, Jing
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 159 - 164
  • [44] Efficient Embedded Decoding of Neural Network Language Models in a Machine Translation System
    Zamora-Martinez, Francisco
    Jose Castro-Bleda, Maria
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2018, 28 (09)
  • [45] Evaluating Indirect Strategies for Chinese-Spanish statistical machine translation with English as Pivot language
    Costa-Jussa, Marta R.
    Henriquez, Carlos
    Banchs, Rafael E.
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (47): : 119 - 126
  • [46] Comparison and system combination of n-gram-based and syntax-based machine translation systems
    Khalilov, Maxim
    Fonollosa, Jose A. R.
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 259 - 266
  • [47] Analysis of Complexity Between Spoken and Written Language for Statistical Machine Translation in West-Slavic Group
    Wolk, Agnieszka
    Wolk, Krzysztof
    Marasek, Krzysztof
    MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, MISSI 2016, 2017, 506 : 251 - 260
  • [48] Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair
    Mireia Farrús
    Marta R. Costa-jussà
    José B. Mariño
    Marc Poch
    Adolfo Hernández
    Carlos Henríquez
    José A. R. Fonollosa
    Language Resources and Evaluation, 2011, 45 : 181 - 208
  • [49] Malayalam Natural Language Processing: Challenges in Building a Phrase-Based Statistical Machine Translation System
    Sebastian, Mary Priya
    Kumar, G. Santhosh
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (04)
  • [50] Designing High Accuracy Statistical Machine Translation for Sign Language Using Parallel Corpus: Case Study English and American Sign Language
    Othman, Achraf
    Jemni, Mohamed
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2019, 12 (02) : 134 - 158