Factored bilingual n-gram language models for statistical machine translation

被引：3

作者：

Crego, Josep M. ^{[1
]}

Yvon, Francois ^{[1
,2
]}

机构：

[1] LIMSI CNRS, BP 133, F-91430 Orsay, France

[2] Univ Paris 11, F-91430 Orsay, France

来源：

MACHINE TRANSLATION | 2010年 / 24卷 / 02期

关键词：

Statistical machine translation; Bilingual n-gram language models; Factored language models;

D O I：

10.1007/s10590-010-9082-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we present an extension of n-gram-based translation models based on factored language models (FLMs). Translation units employed in the n-gram-based approach to statistical machine translation (SMT) are based on mappings of sequences of raw words, while translation model probabilities are estimated through standard language modeling of such bilingual units. Therefore, similar to other translation model approaches (phrase-based or hierarchical), the sparseness problem of the units being modeled leads to unreliable probability estimates, even under conditions where large bilingual corpora are available. In order to tackle this problem, we extend the n-gram-based approach to SMT by tightly integrating more general word representations, such as lemmas and morphological classes, and we use the flexible framework of FLMs to apply a number of different back-off techniques. In this work, we show that FLMs can also be successfully applied to translation modeling, yielding more robust probability estimates that integrate larger bilingual contexts during the translation process.

引用

页码：159 / 175

页数：17

共 50 条

[21] Linguistic Factors in Statistical Machine Translation Involving Arabic Language
Youssef, Islam
Sakr, Mohamed
Kouta, Mohamed
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (11): : 154 - 159
[22] Syntactic discriminative language model rerankers for statistical machine translation
Carter, Simon
Monz, Christof
MACHINE TRANSLATION, 2011, 25 (04) : 317 - 339
[23] English Language Statistical Machine Translation Oriented Classification Algorithm
Yan, Jia
Chao, Wang
2015 INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION, BIG DATA AND SMART CITY (ICITBS), 2016, : 376 - 379
[24] Statistical machine translation of subtitles for highly inflected language pair
Maucec, Mirjam Sepesy
Kacic, Zdravko
Verdonik, Darinka
PATTERN RECOGNITION LETTERS, 2014, 46 : 96 - 103
[25] Integrating Specialized Bilingual Lexicons of Multiword Expressions for Domain Adaptation in Statistical Machine Translation
Semmar, Nasredine
Laib, Meriama
COMPUTATIONAL LINGUISTICS, PACLING 2017, 2018, 781 : 101 - 114
[26] Morphology generation for English-Indian language statistical machine translation
S. Sreelekha
Soft Computing, 2021, 25 : 3657 - 3664
[27] Morphology generation for English-Indian language statistical machine translation
Sreelekha, S.
SOFT COMPUTING, 2021, 25 (05) : 3657 - 3664
[28] A Language Acquisition Method Based on Statistical Machine Translation for Application to Robots
Takabuchi, Kenta
Iwahashi, Naoto
Kunishima, Takeo
2016 JOINT IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL-EPIROB), 2016, : 300 - 301
[29] Morphology in Statistical Machine Translation from English to a Highly Inflectional Language
Maucec, Mirjam S.
Donaj, Gregor
INFORMATION TECHNOLOGY AND CONTROL, 2018, 47 (01): : 63 - 74
[30] Integrating Multi-source Bilingual Information for Chinese Word Segmentation in Statistical Machine Translation
Chen, Wei
Wei, Wei
Chen, Zhenbiao
Xu, Bo
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 61 - 72

← 1 2 3 4 5 →