Factored bilingual n-gram language models for statistical machine translation

被引:3
|
作者
Crego, Josep M. [1 ]
Yvon, Francois [1 ,2 ]
机构
[1] LIMSI CNRS, BP 133, F-91430 Orsay, France
[2] Univ Paris 11, F-91430 Orsay, France
关键词
Statistical machine translation; Bilingual n-gram language models; Factored language models;
D O I
10.1007/s10590-010-9082-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we present an extension of n-gram-based translation models based on factored language models (FLMs). Translation units employed in the n-gram-based approach to statistical machine translation (SMT) are based on mappings of sequences of raw words, while translation model probabilities are estimated through standard language modeling of such bilingual units. Therefore, similar to other translation model approaches (phrase-based or hierarchical), the sparseness problem of the units being modeled leads to unreliable probability estimates, even under conditions where large bilingual corpora are available. In order to tackle this problem, we extend the n-gram-based approach to SMT by tightly integrating more general word representations, such as lemmas and morphological classes, and we use the flexible framework of FLMs to apply a number of different back-off techniques. In this work, we show that FLMs can also be successfully applied to translation modeling, yielding more robust probability estimates that integrate larger bilingual contexts during the translation process.
引用
收藏
页码:159 / 175
页数:17
相关论文
共 50 条
  • [21] Linguistic Factors in Statistical Machine Translation Involving Arabic Language
    Youssef, Islam
    Sakr, Mohamed
    Kouta, Mohamed
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (11): : 154 - 159
  • [22] Syntactic discriminative language model rerankers for statistical machine translation
    Carter, Simon
    Monz, Christof
    MACHINE TRANSLATION, 2011, 25 (04) : 317 - 339
  • [23] English Language Statistical Machine Translation Oriented Classification Algorithm
    Yan, Jia
    Chao, Wang
    2015 INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION, BIG DATA AND SMART CITY (ICITBS), 2016, : 376 - 379
  • [24] Statistical machine translation of subtitles for highly inflected language pair
    Maucec, Mirjam Sepesy
    Kacic, Zdravko
    Verdonik, Darinka
    PATTERN RECOGNITION LETTERS, 2014, 46 : 96 - 103
  • [25] Integrating Specialized Bilingual Lexicons of Multiword Expressions for Domain Adaptation in Statistical Machine Translation
    Semmar, Nasredine
    Laib, Meriama
    COMPUTATIONAL LINGUISTICS, PACLING 2017, 2018, 781 : 101 - 114
  • [26] Morphology generation for English-Indian language statistical machine translation
    S. Sreelekha
    Soft Computing, 2021, 25 : 3657 - 3664
  • [27] Morphology generation for English-Indian language statistical machine translation
    Sreelekha, S.
    SOFT COMPUTING, 2021, 25 (05) : 3657 - 3664
  • [28] A Language Acquisition Method Based on Statistical Machine Translation for Application to Robots
    Takabuchi, Kenta
    Iwahashi, Naoto
    Kunishima, Takeo
    2016 JOINT IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL-EPIROB), 2016, : 300 - 301
  • [29] Morphology in Statistical Machine Translation from English to a Highly Inflectional Language
    Maucec, Mirjam S.
    Donaj, Gregor
    INFORMATION TECHNOLOGY AND CONTROL, 2018, 47 (01): : 63 - 74
  • [30] Integrating Multi-source Bilingual Information for Chinese Word Segmentation in Statistical Machine Translation
    Chen, Wei
    Wei, Wei
    Chen, Zhenbiao
    Xu, Bo
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 61 - 72