Factored bilingual n-gram language models for statistical machine translation

被引：3

作者：

Crego, Josep M. ^{[1
]}

Yvon, Francois ^{[1
,2
]}

机构：

[1] LIMSI CNRS, BP 133, F-91430 Orsay, France

[2] Univ Paris 11, F-91430 Orsay, France

来源：

MACHINE TRANSLATION | 2010年 / 24卷 / 02期

关键词：

Statistical machine translation; Bilingual n-gram language models; Factored language models;

D O I：

10.1007/s10590-010-9082-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we present an extension of n-gram-based translation models based on factored language models (FLMs). Translation units employed in the n-gram-based approach to statistical machine translation (SMT) are based on mappings of sequences of raw words, while translation model probabilities are estimated through standard language modeling of such bilingual units. Therefore, similar to other translation model approaches (phrase-based or hierarchical), the sparseness problem of the units being modeled leads to unreliable probability estimates, even under conditions where large bilingual corpora are available. In order to tackle this problem, we extend the n-gram-based approach to SMT by tightly integrating more general word representations, such as lemmas and morphological classes, and we use the flexible framework of FLMs to apply a number of different back-off techniques. In this work, we show that FLMs can also be successfully applied to translation modeling, yielding more robust probability estimates that integrate larger bilingual contexts during the translation process.

引用

页码：159 / 175

页数：17

共 50 条

[31] Integration of Bilingual Lists for Domain-Specific Statistical Machine Translation for Sinhala-Tamil
Farhath, Fathima
Ranathunga, Surangika
Jayasena, Sanath
Dias, Gihan
2018 MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON) 4TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2018, : 538 - 543
[32] Extracting Bilingual Multi-word Expressions for Low-resource Statistical Machine Translation
Wei, Linyu
Li, Miao
Chen, Lei
Yang, Zhenxin
Sun, Kai
Yuan, Man
PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 21 - 24
[33] Graph-based Lexicalized Reordering Models for Statistical Machine Translation
Su Jinsong
Liu Yang
Liu Qun
Dong Huailin
CHINA COMMUNICATIONS, 2014, 11 (05) : 71 - 82
[34] Improving Reordering Models with Phrase Number Feature for Statistical Machine Translation
Noormohammadi, Neda
Rahimi, Zahra
Khadivi, Shahram
ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING, AISP 2013, 2014, 427 : 227 - 233
[35] Seal: Efficient Training Large Scale Statistical Machine Translation Models on Spark
Gu, Rong
Chen, Min
Yang, Wenjia
Yuan, Chunfeng
Huang, Yihua
2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 118 - 125
[36] A comparison of segmentation methods and extended lexicon models for Arabic statistical machine translation
Hasan, Sasa
Mansour, Saab
Ney, Hermann
MACHINE TRANSLATION, 2012, 26 (1-2) : 47 - 65
[37] STATISTICAL VERSUS NEURAL MACHINE TRANSLATION - A CASE STUDY FOR A MEDIUM SIZE DOMAIN-SPECIFIC BILINGUAL CORPUS
Jassem, Krzysztof
Dwojak, Tomasz
POZNAN STUDIES IN CONTEMPORARY LINGUISTICS, 2019, 55 (02) : 491 - 519
[38] Syntax-Based Chinese-Vietnamese Tree-to-Tree Statistical Machine Translation with Bilingual Features
Gao, Shengxiang
Huang, Jihao
Xue, Mingya
Yu, Zhengtao
Wang, Zhuo
Zhang, Yang
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (04)
[39] Integrating source-language context into phrase-based statistical machine translation
Haque, Rejwanul
Naskar, Sudip Kumar
van den Bosch, Antal
Way, Andy
MACHINE TRANSLATION, 2011, 25 (03) : 239 - 285
[40] Reduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language
Maucec, Mirjam Sepesy
Brest, Janez
INFORMATICA, 2010, 21 (01) : 95 - 116

← 1 2 3 4 5 →