Improving English-Arabic statistical machine translation with morpho-syntactic and semantic word class

被引:0
|
作者
Khemakhem I.T. [1 ]
Jamoussi S. [1 ]
Hamadou A.B. [1 ]
机构
[1] MIRACL Laboratory, University of Sfax
关键词
Alignment; Morpho-syntactic word classes; Semantic word classes; SMT; Statistical machine translation;
D O I
10.1504/IJISTA.2020.107225
中图分类号
学科分类号
摘要
In this paper, we present a new method for the extraction and integrating of morpho-syntactic and semantic word classes in a statistical machine translation (SMT) context to improve the quality of English-Arabic translation. It can be applied across different statistical machine translations and with languages that have complicated morphological paradigms. In our method, we first identify morpho-syntactic word classes to build up our statistical language model. Then, we apply a semantic word clustering algorithm for English. The obtained semantic word classes are projected from the English side to the featured Arabic side. This projection is based on available word alignment provided by the alignment step using GIZA++ tool. Finally, we apply a new process to incorporate semantic classes in order to improve the SMT quality. We show its efficacy on small and larger English to Arabic translation tasks. The experimental results show that introducing morpho-syntactic and semantic word classes achieves 7.7% of relative improvement on the BLEU score. © 2020 Inderscience Enterprises Ltd.
引用
收藏
页码:172 / 190
页数:18
相关论文
共 49 条
  • [1] Improving statistical word alignments with morpho-syntactic transformations
    de Gispert, Adria
    Gupta, Deepa
    Popovic, Maja
    Lambert, Patrik
    Marino, Jose B.
    Federico, Marcello
    Ney, Hermann
    Banchs, Rafael
    ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4139 : 368 - 379
  • [2] Word Agreement and Ordering in English-Arabic Machine Translation
    Abu Shquier, Mohammed M.
    Sembok, Tengku Mohd T.
    INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS: COGNITIVE INFORMATICS: BRIDGING NATURAL AND ARTIFICIAL KNOWLEDGE, 2008, : 644 - +
  • [3] The Impact of Word Segmentation Techniques on Neural and Statistical Machine Translation: English-Arabic Case
    Berrichi, Safae
    Mazroui, Azzeddine
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 1, 2022, 1417 : 454 - 462
  • [4] English-Arabic Statistical Machine Translation: State of the Art
    Ebrahim, Sara
    Hegazy, Doaa
    Mostafa, Mostafa G. M.
    El-Beltagy, Samhaa R.
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 520 - 533
  • [5] Statistical machine translation with scarce resources using morpho-syntactic information
    Niessen, S
    Ney, H
    COMPUTATIONAL LINGUISTICS, 2004, 30 (02) : 181 - 204
  • [6] Benefits of morphosyntactic features on English-Arabic Statistical Machine Translation
    Berrichi, Safae
    Mazroui, Azzeddine
    2018 IEEE 5TH INTERNATIONAL CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'18), 2018, : 244 - 248
  • [7] Reduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language
    Maucec, Mirjam Sepesy
    Brest, Janez
    INFORMATICA, 2010, 21 (01) : 95 - 116
  • [8] Orthographic and morphological processing for English-Arabic statistical machine translation
    El Kholy, Ahmed
    Habash, Nizar
    MACHINE TRANSLATION, 2012, 26 (1-2) : 25 - 45
  • [9] Detecting and Integrating Multiword Expression into English-Arabic Statistical Machine Translation
    Ebrahim, Sara
    Hegazy, Doaa
    Mostafa, Mostafa Gadal-Haqq M.
    El-Beltagy, Samhaa R.
    ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2017), 2017, 117 : 111 - 118
  • [10] SVO word order errors in english-arabic translation
    Al-Jarf, Reima-Sado
    META, 2007, 52 (02) : 299 - 308