Information Dropping Data Augmentation for Machine Translation Quality Estimation

被引:1
作者
Li, Shuo [1 ]
Bi, Xiaojun [2 ,3 ]
Liu, Tao [4 ]
Chen, Zheng [2 ,3 ]
机构
[1] Harbin Engn Univ, Coll Informat & Commun Engn, Harbin 150001, Peoples R China
[2] Minzu Univ China, Key Lab Ethn Language Intelligent Anal & Secur Gov, Beijing 100086, Peoples R China
[3] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China
[4] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin 150001, Peoples R China
基金
中国国家自然科学基金;
关键词
Machine translation; Data augmentation; Data models; Computational modeling; Training data; Estimation; Training; information entropy; machine translation; pseudo label; quality estimation;
D O I
10.1109/TASLP.2024.3380996
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Machine translation quality estimation (QE) refers to the quality assessment of machine translations without a given reference translation. Supervised QE models based on neural networks have achieved state-of-the-art results. But this method requires large-scale training data, which requires bilingual experts to create high-quality labels. This is often very costly. Therefore, we propose a sentence-level machine translation QE data augmentation method based on information dropping. Firstly, we calculate the subwords information of the target translation based on the conditional language model. Subsequently, some subwords in the target translation are randomly deleted or replaced. We obtain the pseudo quality score by calculating the remaining information. Finally, the original and augmented data are combined to train the final model. This pseudo-data generation method based on information dropping strategy enables us to obtain more faithful and diverse training samples without requiring additional corpus resources. Experimental results show that we improve the correlation with human judgment by an average of 5.96% in the seven translation directions of the MLQE-PE dataset, while improving the model's robustness to low adequacy samples. In addition, the method does not require any modifications to the model architecture.
引用
收藏
页码:2112 / 2124
页数:13
相关论文
共 50 条
  • [41] Machine Translation and Disclosure of Patent Information
    Larroyed, Aline A.
    IIC-INTERNATIONAL REVIEW OF INTELLECTUAL PROPERTY AND COMPETITION LAW, 2018, 49 (07) : 763 - 786
  • [42] Alleviating Exposure Bias for Neural Machine Translation via Contextual Augmentation and Self Distillation
    Liu, Zhidong
    Li, Junhui
    Zhu, Muhua
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2079 - 2089
  • [43] MARMOT: A Toolkit for Translation Quality Estimation at the Word Level
    Logacheva, Varvara
    Hokamp, Chris
    Specia, Lucia
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3671 - 3674
  • [44] Application of Minimax Optimization Mechanism in Chinese-English Machine Translation Quality Estimation
    Zhang, Xiaomei
    IEEE ACCESS, 2025, 13 : 19026 - 19039
  • [45] Data Augmentation using Machine Translation for Fake News Detection in the Urdu Language
    Amjad, Maaz
    Sidorov, Grigori
    Zhila, Alisa
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2537 - 2542
  • [46] A Document-Level Machine Translation Quality Estimation Model Based on Centering Theory
    Chen, Yidong
    Zhong, Enjun
    Tong, Yiqi
    Qiu, Yanru
    Shi, Xiaodong
    MACHINE TRANSLATION, CCMT 2021, 2021, 1464 : 1 - 15
  • [47] Study on evaluation method of machine translation quality based on questionnaires and data analysis
    Sun, Yiqun
    Zhou, Minkang
    PROCEEDINGS OF THE 2016 3RD INTERNATIONAL CONFERENCE ON MATERIALS ENGINEERING, MANUFACTURING TECHNOLOGY AND CONTROL, 2016, 67 : 1565 - 1570
  • [48] Correlation-Based Data Augmentation for Machine Learning and Its Application to Road Environment Recognition
    Omachi, Shinichiro
    Omachi, Masako
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2022, 71 (07) : 7113 - 7121
  • [49] Quality and Machine Translation: A realistic objective?
    Fiederer, Rebecca
    O'Brien, Sharon
    JOURNAL OF SPECIALISED TRANSLATION, 2009, (11) : 52 - 74
  • [50] Machine translation quality in an audiovisual context
    Burchardt, Aljoscha
    Lommel, Arle
    Bywood, Lindsay
    Harris, Kim
    Popovic, Maja
    TARGET-INTERNATIONAL JOURNAL OF TRANSLATION STUDIES, 2016, 28 (02) : 206 - 221