Terminology Translation in Low-Resource Scenarios

被引:2
|
作者
Haque, Rejwanul [1 ]
Hasanuzzaman, Mohammed [2 ]
Way, Andy [1 ]
机构
[1] Dublin City Univ, Sch Comp, Dublin 9, Glasnevin, Ireland
[2] Cork Inst Technol, Dept Comp Sci, Cork T12 P928, Ireland
基金
爱尔兰科学基金会;
关键词
machine translation; terminology translation; phrase-based statistical machine translation; neural machine translation; terminology translation evaluation;
D O I
10.3390/info10090273
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Term translation quality in machine translation (MT), which is usually measured by domain experts, is a time-consuming and expensive task. In fact, this is unimaginable in an industrial setting where customised MT systems often need to be updated for many reasons (e.g., availability of new training data, leading MT techniques). To the best of our knowledge, as of yet, there is no publicly-available solution to evaluate terminology translation in MT automatically. Hence, there is a genuine need to have a faster and less-expensive solution to this problem, which could help end-users to identify term translation problems in MT instantly. This study presents a faster and less expensive strategy for evaluating terminology translation in MT. High correlations of our evaluation results with human judgements demonstrate the effectiveness of the proposed solution. The paper also introduces a classification framework, TermCat, that can automatically classify term translation-related errors and expose specific problems in relation to terminology translation in MT. We carried out our experiments with a low resource language pair, English-Hindi, and found that our classifier, whose accuracy varies across the translation directions, error classes, the morphological nature of the languages, and MT models, generally performs competently in the terminology translation classification task.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] A Survey on Machine Translation of Low-Resource Arabic Dialects
    Abdul-Nabi, Razan
    Obeidat, Rasha
    Bsoul, Anas
    2024 15TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS, ICICS 2024, 2024,
  • [2] Translation Memories as Baselines for Low-Resource Machine Translation
    Knowles, Rebecca
    Littell, Patrick
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6759 - 6767
  • [3] Enabling Medical Translation for Low-Resource Languages
    Musleh, Ahmad
    Durrani, Nadir
    Temnikova, Irina
    Nakov, Preslav
    Vogel, Stephan
    Alsaad, Osama
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT II, 2018, 9624 : 3 - 16
  • [4] Low-Resource Translation Quality Estimation for Estonian
    Yankovskaya, Elizaveta
    Fishel, Mark
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018, 2018, 307 : 175 - 182
  • [5] Transformers for Low-resource Neural Machine Translation
    Gezmu, Andargachew Mekonnen
    Nuernberger, Andreas
    ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2022, : 459 - 466
  • [6] A Survey on Low-resource Neural Machine Translation
    Li H.-Z.
    Feng C.
    Huang H.-Y.
    Huang, He-Yan (hhy63@bit.edu.cn), 1600, Science Press (47): : 1217 - 1231
  • [7] Speech-to-speech Low-resource Translation
    Liu, Hsiao-Chuan
    Day, Min-Yuh
    Wang, Chih-Chien
    2023 IEEE 24TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE, IRI, 2023, : 91 - 95
  • [8] Decoding Strategies for Improving Low-Resource Machine Translation
    Park, Chanjun
    Yang, Yeongwook
    Park, Kinam
    Lim, Heuiseok
    ELECTRONICS, 2020, 9 (10) : 1 - 15
  • [9] Low-resource Neural Machine Translation: Methods and Trends
    Shi, Shumin
    Wu, Xing
    Su, Rihai
    Huang, Heyan
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
  • [10] Neural Machine Translation for Low-resource Languages: A Survey
    Ranathunga, Surangika
    Lee, En-Shiun Annie
    Skenduli, Marjana Prifti
    Shekhar, Ravi
    Alam, Mehreen
    Kaur, Rishemjit
    ACM COMPUTING SURVEYS, 2023, 55 (11)