Using syntax for improving phrase-based SMT in low-resource languages

被引:2
|
作者
Fadaei, Hakimeh [1 ]
Faili, Heshaam [1 ,2 ]
机构
[1] Univ Tehran, Sch Elect & Comp Engn, Coll Engn, Tehran, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran
基金
美国国家科学基金会;
关键词
MODEL;
D O I
10.1093/llc/fqz033
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Data driven approaches for machine translation, such as statistical and neural machine translation, suffer from sparsity when dealing with low-resource languages. In these cases, using other sources of information including linguistic information could alleviate the problem. In this article, we focus on the problem of word ordering in translation from a high-resource to a low-resource language and try to improve the quality by using syntactic information from the high-resource side. We propose some syntactic features based on Tree Adjoining Grammar (TAG) to be employed in a phrase-based SMT model in order to improve the word ordering. In this work, a set of synchronous TAG rules is extracted and used to estimate the probability of the phrase orders suggested by the phrase-based model. The main idea of the article is to handle the word ordering by using the extended domain of locality property of TAG and abstracting the long distance dependencies into a local view, which is a TAG elementary tree. The experiments on English-Persian and English-German translation showed that, by combining the proposed TAG-based reordering features with lexical and hierarchical reordering models, we gain significant improvements over the baseline and in comparison with a neural reordering model and a pre-reordering model.
引用
收藏
页码:507 / 528
页数:22
相关论文
共 50 条
  • [31] Classifying educational materials in low-resource languages
    Sohsah, Gihad N.
    Guzey, Onur
    Tarmanini, Zaina
    2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 431 - 435
  • [32] GlotLID: Language Identification for Low-Resource Languages
    Kargaran, Amir Hossein
    Imani, Ayyoob
    Yvon, Francois
    Schuetze, Hinrich
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6155 - 6218
  • [33] Discourse annotation guideline for low-resource languages
    Vargas, Francielle
    Schmeisser-Nieto, Wolfgang
    Rabinovich, Zohar
    Pardo, Thiago A. S.
    Benevenuto, Fabricio
    NATURAL LANGUAGE PROCESSING, 2025, 31 (02): : 700 - 743
  • [34] Extending Multilingual BERT to Low-Resource Languages
    Wang, Zihan
    Karthikeyan, K.
    Mayhew, Stephen
    Roth, Dan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2649 - 2656
  • [35] Attention is all low-resource languages need
    Poupard, Duncan
    TRANSLATION STUDIES, 2024, 17 (02) : 424 - 427
  • [36] Enhancing the quality of Phrase-table in Statistical Machine Translation for Less-Common and Low-Resource Languages
    Minh-Thuan Nguyen
    Van-Tan Bui
    Huy-Hien Vu
    Phuong-Thai Nguyen
    Chi-Mai Luong
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 165 - 170
  • [37] Improving Phrase-based Korean-English Statistical Machine Translation
    Lee, Jonghoon
    Lee, Donghyeon
    Lee, Gary Geunbae
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 753 - 756
  • [38] Improving phrase-based statistical translation through combination of word alignments
    Chen, Boxing
    Federico, Marcello
    ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4139 : 356 - 367
  • [39] Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary
    Fang, Meng
    Cohn, Trevor
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 587 - 593
  • [40] Leveraging Additional Resources for Improving Statistical Machine Translation on Asian Low-Resource Languages
    Hai-Long Trieu
    Duc-Vu Tran
    Ittoo, Ashwin
    Le-Minh Nguyen
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (03)