Factored bilingual n-gram language models for statistical machine translation

被引：3

作者：

Crego, Josep M. ^{[1
]}

Yvon, Francois ^{[1
,2
]}

机构：

[1] LIMSI CNRS, BP 133, F-91430 Orsay, France

[2] Univ Paris 11, F-91430 Orsay, France

来源：

MACHINE TRANSLATION | 2010年 / 24卷 / 02期

关键词：

Statistical machine translation; Bilingual n-gram language models; Factored language models;

D O I：

10.1007/s10590-010-9082-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we present an extension of n-gram-based translation models based on factored language models (FLMs). Translation units employed in the n-gram-based approach to statistical machine translation (SMT) are based on mappings of sequences of raw words, while translation model probabilities are estimated through standard language modeling of such bilingual units. Therefore, similar to other translation model approaches (phrase-based or hierarchical), the sparseness problem of the units being modeled leads to unreliable probability estimates, even under conditions where large bilingual corpora are available. In order to tackle this problem, we extend the n-gram-based approach to SMT by tightly integrating more general word representations, such as lemmas and morphological classes, and we use the flexible framework of FLMs to apply a number of different back-off techniques. In this work, we show that FLMs can also be successfully applied to translation modeling, yielding more robust probability estimates that integrate larger bilingual contexts during the translation process.

引用

页码：159 / 175

页数：17

共 50 条

[41] Data Categorization and Model Weighting Approach for Language Model Adaptation in Statistical Machine Translation
AbuHamad, Mohammed
Mohd, Masnizah
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (01) : 135 - 141
[42] NAME-AWARE LANGUAGE MODEL ADAPTATION AND SPARSE FEATURES FOR STATISTICAL MACHINE TRANSLATION
Wang, Wen
Li, Haibo
Ji, Heng
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 324 - 330
[43] Reranking machine translation hypotheses with structured and web-based language models
Wang, Wen
Stolcke, Andreas
Zheng, Jing
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 159 - 164
[44] Efficient Embedded Decoding of Neural Network Language Models in a Machine Translation System
Zamora-Martinez, Francisco
Jose Castro-Bleda, Maria
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2018, 28 (09)
[45] Evaluating Indirect Strategies for Chinese-Spanish statistical machine translation with English as Pivot language
Costa-Jussa, Marta R.
Henriquez, Carlos
Banchs, Rafael E.
PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (47): : 119 - 126
[46] Comparison and system combination of n-gram-based and syntax-based machine translation systems
Khalilov, Maxim
Fonollosa, Jose A. R.
PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 259 - 266
[47] Analysis of Complexity Between Spoken and Written Language for Statistical Machine Translation in West-Slavic Group
Wolk, Agnieszka
Wolk, Krzysztof
Marasek, Krzysztof
MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, MISSI 2016, 2017, 506 : 251 - 260
[48] Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair
Mireia Farrús
Marta R. Costa-jussà
José B. Mariño
Marc Poch
Adolfo Hernández
Carlos Henríquez
José A. R. Fonollosa
Language Resources and Evaluation, 2011, 45 : 181 - 208
[49] Malayalam Natural Language Processing: Challenges in Building a Phrase-Based Statistical Machine Translation System
Sebastian, Mary Priya
Kumar, G. Santhosh
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (04)
[50] Designing High Accuracy Statistical Machine Translation for Sign Language Using Parallel Corpus: Case Study English and American Sign Language
Othman, Achraf
Jemni, Mohamed
JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2019, 12 (02) : 134 - 158

← 1 2 3 4 5 →