Extracting Bilingual Multi-word Expressions for Low-resource Statistical Machine Translation

被引:0
|
作者
Wei, Linyu [1 ,2 ]
Li, Miao [2 ]
Chen, Lei [2 ]
Yang, Zhenxin [1 ,2 ]
Sun, Kai [1 ,2 ]
Yuan, Man [3 ]
机构
[1] Univ Sci & Technol China, Dept Automat, Hefei, Peoples R China
[2] Chinese Acad Sci, IIM, Inst Intelligent Machines, Hefei, Peoples R China
[3] Hefei Univ Technol, HFUT, Hefei, Peoples R China
来源
PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING | 2015年
关键词
Statistical machine translation; Multi-word expression extraction; Low-resource language pair;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Improving the performance of statistical machine translation is often a significant problem, especially in low language resource scenarios such as Chinese-Mongolian SMT. In this paper, we propose a method to improve the performance of Chinese-Mongolian SMT system using multi-word expressions, which is also a pilot study for this language pair. We extract MWEs from the phrase-table then integrate the MWEs into SMT system by various strategies. Experimental results indicate our method outperforms a baseline model by 0.81 BLEU points on Test-All and 1.54 BLEU points on Test-MWE.
引用
收藏
页码:21 / 24
页数:4
相关论文
共 50 条
  • [1] Identifying bilingual Multi-Word Expressions for Statistical Machine Translation
    Bouamor, Dhouha
    Semmar, Nasredine
    Zweigenbaum, Pierre
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 674 - 679
  • [2] Utilization of Multi-word Expressions to Improve Statistical Machine Translation of Statutory Sentences
    Sakamoto, Satomi
    Ogawa, Yasuhiro
    Nakamura, Makoto
    Ohno, Tomohiro
    Toyama, Katsuhiko
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2017, 10091 : 249 - 264
  • [3] Machine translation and human translation of multi-word expressions: peeling this pineapple
    Rebechi, Rozane Rodrigues
    Marcon, Nathalia Oliva
    Faller, Guilherme de Almeida
    REVISTA VIRTUAL DE ESTUDOS DA LINGUAGEM-REVEL, 2025, 23 (44): : 346 - 380
  • [4] Multi-word Expressions in English-Latvian Machine Translation
    Skadina, Inguna
    BALTIC JOURNAL OF MODERN COMPUTING, 2016, 4 (04): : 811 - 825
  • [5] Framework for Handling Rare Word Problems in Neural Machine Translation System Using Multi-Word Expressions
    Garg, Kamal Deep
    Shekhar, Shashi
    Kumar, Ajit
    Goyal, Vishal
    Sharma, Bhisham
    Chengoden, Rajeswari
    Srivastava, Gautam
    APPLIED SCIENCES-BASEL, 2022, 12 (21):
  • [6] A Content Word Augmentation Method for Low-Resource Neural Machine Translation
    Li, Fuxue
    Zhao, Zhongchao
    Chi, Chuncheng
    Yan, Hong
    Zhang, Zhen
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 720 - 731
  • [7] A Bilingual Templates Data Augmentation Method for Low-Resource Neural Machine Translation
    Li, Fuxue
    Liu, Beibei
    Yan, Hong
    Shao, Mingzhi
    Xie, Peijun
    Li, Jiarui
    Chi, Chuncheng
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 40 - 51
  • [8] Regressing Word and Sentence Embeddings for Low-Resource Neural Machine Translation
    Unanue I.J.
    Borzeshi E.Z.
    Piccardi M.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (03): : 450 - 463
  • [9] Augmenting Training Data for Low-Resource Neural Machine Translation via Bilingual Word Embeddings and BERT Language Modelling
    Ramesh, Akshai
    Uhana, Haque Usuf
    Parthasarathy, Venkatesh Balavadhani
    Haque, Rejwanul
    Way, Andy
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [10] Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages
    Duh, Kevin
    McNamee, Paul
    Post, Matt
    Thompson, Brian
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2667 - 2675