Identifying bilingual Multi-Word Expressions for Statistical Machine Translation

被引:0
|
作者
Bouamor, Dhouha [1 ,2 ,3 ]
Semmar, Nasredine [1 ]
Zweigenbaum, Pierre [2 ,3 ]
机构
[1] CEA, LIST, Vis & Content Engn Lab, F-91191 Gif Sur Yvette, France
[2] CNRS, LIMSI, F-91403 Orsay, France
[3] Univ Paris 11, Orsay, France
来源
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2012年
关键词
bilingual Multi-Word Expression; Vector Space Model; Statistical Machine Translation;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
MultiWord Expressions (MWEs) repesent a key issue for numerous applications in Natural Language Processing (NLP) especially for Machine Translation (MT). In this paper, we describe a strategy for detecting translation pairs of MWEs in a French-English parallel corpus. In addition we introduce three methods aiming to integrate extracted bilingual MWES in MOSES, a phrase based Statistical Machine Translation (SMT) system. We experimentally show that these textual units can improve translation quality.
引用
收藏
页码:674 / 679
页数:6
相关论文
共 50 条
  • [21] Constraint Based Description of Polish Multi-word Expressions
    Kurc, Roman
    Piasecki, Maciej
    Broda, Bartosz
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2408 - 2413
  • [22] Bilingual Segmenter for Statistical Machine Translation
    Huang, Chung-Chi
    Chen, Wei-Teh
    Chang, Jason S.
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION, 2008, : 97 - +
  • [23] Integrating Specialized Bilingual Lexicons of Multiword Expressions for Domain Adaptation in Statistical Machine Translation
    Semmar, Nasredine
    Laib, Meriama
    COMPUTATIONAL LINGUISTICS, PACLING 2017, 2018, 781 : 101 - 114
  • [24] Bilingual phrases for statistical machine translation
    Garcia-Varea, I.
    Nevado, F.
    Ortiz, D.
    Tomas, J.
    Casacuberta, F.
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (35): : 93 - 100
  • [25] Building wordnets with multi-word expressions from parallel corpora
    Simoes, Alberto
    Gomez Guinovart, Xavier
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (64): : 45 - 52
  • [26] Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects
    Bogantes, Diana
    Rodriguez, Eric
    Arauco, Alejandro
    Rodriguez, Alejandro
    Savary, Agata
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2255 - 2261
  • [27] Multi-Word Expressions Annotations Effect in Document Classification Task
    Najar, Dhekra
    Mesfar, Slim
    Ben Ghezela, Henda
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2018), 2018, 10859 : 238 - 246
  • [28] Hybrid Approach for Automatic Identification of Multi-Word Expressions in Lithuanian
    Mandravickaite, Justina
    Rimkute, Erika
    Krilavicius, Tomas
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2016, 289 : 153 - 159
  • [29] Extraction of multi-word expressions from small parallel corpora
    Tsvetkov, Yulia
    Wintner, Shuly
    NATURAL LANGUAGE ENGINEERING, 2012, 18 : 549 - 573
  • [30] Word Sense Disambiguation applied to Assamese-Hindi Bilingual Statistical Machine Translation
    Barman, Anup Kumar
    Sarmah, Jumi
    Basimatary, Subungshri
    Nag, Amitava
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2024, 14 (01) : 12581 - 12586