Information Dropping Data Augmentation for Machine Translation Quality Estimation

被引:1
作者
Li, Shuo [1 ]
Bi, Xiaojun [2 ,3 ]
Liu, Tao [4 ]
Chen, Zheng [2 ,3 ]
机构
[1] Harbin Engn Univ, Coll Informat & Commun Engn, Harbin 150001, Peoples R China
[2] Minzu Univ China, Key Lab Ethn Language Intelligent Anal & Secur Gov, Beijing 100086, Peoples R China
[3] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China
[4] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin 150001, Peoples R China
基金
中国国家自然科学基金;
关键词
Machine translation; Data augmentation; Data models; Computational modeling; Training data; Estimation; Training; information entropy; machine translation; pseudo label; quality estimation;
D O I
10.1109/TASLP.2024.3380996
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Machine translation quality estimation (QE) refers to the quality assessment of machine translations without a given reference translation. Supervised QE models based on neural networks have achieved state-of-the-art results. But this method requires large-scale training data, which requires bilingual experts to create high-quality labels. This is often very costly. Therefore, we propose a sentence-level machine translation QE data augmentation method based on information dropping. Firstly, we calculate the subwords information of the target translation based on the conditional language model. Subsequently, some subwords in the target translation are randomly deleted or replaced. We obtain the pseudo quality score by calculating the remaining information. Finally, the original and augmented data are combined to train the final model. This pseudo-data generation method based on information dropping strategy enables us to obtain more faithful and diverse training samples without requiring additional corpus resources. Experimental results show that we improve the correlation with human judgment by an average of 5.96% in the seven translation directions of the MLQE-PE dataset, while improving the model's robustness to low adequacy samples. In addition, the method does not require any modifications to the model architecture.
引用
收藏
页码:2112 / 2124
页数:13
相关论文
共 50 条
  • [1] Target Oriented Data Generation for Quality Estimation of Machine Translation
    Wu, Huanqin
    Yang, Muyun
    Wang, Jiaqi
    Zhu, Junguo
    Zhao, Tiejun
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 393 - 405
  • [2] A Data Augmentation Method for English-Vietnamese Neural Machine Translation
    Pham, Nghia Luan
    Nguyen, Van Vinh
    Pham, Thang Viet
    IEEE ACCESS, 2023, 11 : 28034 - 28044
  • [3] An Oblivious Approach to Machine Translation Quality Estimation
    Elmakias, Itamar
    Vilenchik, Dan
    MATHEMATICS, 2021, 9 (17)
  • [4] Enhancing machine translation with quality estimation and reinforcement learning
    Yang, Zijian Gyozo
    Laki, Laszlo Janos
    ANNALES MATHEMATICAE ET INFORMATICAE, 2023, 58 : 182 - 190
  • [5] Dimensionality reduction methods for machine translation quality estimation
    Gonzalez-Rubio, Jesus
    Ramon Navarro-Cerdan, J.
    Casacuberta, Francisco
    MACHINE TRANSLATION, 2013, 27 (3-4) : 281 - 301
  • [6] Improving Data Driven Inverse Text Normalization using Data Augmentation and Machine Translation
    Paul, Debjyoti
    Pang, Yutong
    Chen, Szu-Jui
    Zhang, Xuedong
    INTERSPEECH 2022, 2022, : 5221 - 5222
  • [7] QUALES: Machine Translation Quality Estimation via Supervised and Unsupervised Machine Learning
    Etchegoyhen, Thierry
    Martinez Garcia, Eva
    Azpeitia, Andoni
    Alegria, Inaki
    Labaka, Gorka
    Otegi, Arantza
    Sarasola, Kepa
    Cortes, Itziar
    Jauregi, Amaia
    Ellakuria, Igor
    Calonge, Eusebi
    Martin, Maite
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2018, (61): : 143 - 146
  • [8] Learning Bilingual Sentence Representations for Quality Estimation of Machine Translation
    Zhu, Junguo
    Yang, Muyun
    Li, Sheng
    Zhao, Tiejun
    MACHINE TRANSLATION, 2016, 668 : 35 - 42
  • [9] Word-Level Quality Estimation for Korean-English Neural Machine Translation
    Eo, Sugyeong
    Park, Chanjun
    Moon, Hyeonseok
    Seo, Jaehyung
    Lim, Heuiseok
    IEEE ACCESS, 2022, 10 : 44964 - 44973
  • [10] Neural approach-based quality estimation in improving translation of English to Hindi using machine translation under data science
    Chouhan, Mansi
    Srivastava, Devesh Kumar
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021, : 35 - 39