Using syntax for improving phrase-based SMT in low-resource languages

被引:2
|
作者
Fadaei, Hakimeh [1 ]
Faili, Heshaam [1 ,2 ]
机构
[1] Univ Tehran, Sch Elect & Comp Engn, Coll Engn, Tehran, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran
基金
美国国家科学基金会;
关键词
MODEL;
D O I
10.1093/llc/fqz033
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Data driven approaches for machine translation, such as statistical and neural machine translation, suffer from sparsity when dealing with low-resource languages. In these cases, using other sources of information including linguistic information could alleviate the problem. In this article, we focus on the problem of word ordering in translation from a high-resource to a low-resource language and try to improve the quality by using syntactic information from the high-resource side. We propose some syntactic features based on Tree Adjoining Grammar (TAG) to be employed in a phrase-based SMT model in order to improve the word ordering. In this work, a set of synchronous TAG rules is extracted and used to estimate the probability of the phrase orders suggested by the phrase-based model. The main idea of the article is to handle the word ordering by using the extended domain of locality property of TAG and abstracting the long distance dependencies into a local view, which is a TAG elementary tree. The experiments on English-Persian and English-German translation showed that, by combining the proposed TAG-based reordering features with lexical and hierarchical reordering models, we gain significant improvements over the baseline and in comparison with a neural reordering model and a pre-reordering model.
引用
收藏
页码:507 / 528
页数:22
相关论文
共 50 条
  • [21] A Method for Incorporating Language Models Based on Linkage Grammar into Phrase-Based SMT Models
    Chen, Yidong
    Shi, Xiaodong
    Zhou, Changle
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2011, 14 (04): : 1219 - 1230
  • [22] Voice Activation for Low-Resource Languages
    Kolesau, Aliaksei
    Sesok, Dmitrij
    APPLIED SCIENCES-BASEL, 2021, 11 (14):
  • [23] An Empirical Study on Improving Hierarchical Phrase-based Translation Using Alignment Features
    Huang, Songfang
    Zhou, Bowen
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2112 - 2115
  • [24] English to Japanese Spoken Lecture Translation System by Using DNN-HMM and Phrase-based SMT
    Goto, Norioki
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    2015 2ND INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS ICAICTA, 2015,
  • [25] Phrase-based correction model for improving handwriting recognition accuracies
    Farooq, Faisal
    Jose, Damien
    Govindaraju, Venu
    PATTERN RECOGNITION, 2009, 42 (12) : 3271 - 3277
  • [26] Improving NER Tagging Performance in Low-Resource Languages via Multilingual Learning
    Murthy, Rudra
    Khapra, Mitesh M.
    Bhattacharyya, Pushpak
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (02)
  • [27] Improving Phrase-Based Statistical Machine Translation with Preprocessing Techniques
    Yashothara, S.
    Uthayasanker, R. T.
    Jayasena, S.
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 322 - 327
  • [28] Improving phrase-based statistical machine translation with morphosyntactic transformation
    Thai Phuong Nguyen
    Shimazu, Akira
    MACHINE TRANSLATION, 2006, 20 (03) : 147 - 166
  • [29] A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages
    Vania, Clara
    Kementchedjhieva, Yova
    Sogaard, Anders
    Lopez, Adam
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1105 - 1116
  • [30] Enabling Medical Translation for Low-Resource Languages
    Musleh, Ahmad
    Durrani, Nadir
    Temnikova, Irina
    Nakov, Preslav
    Vogel, Stephan
    Alsaad, Osama
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT II, 2018, 9624 : 3 - 16