Information Dropping Data Augmentation for Machine Translation Quality Estimation

被引：1

作者：

Li, Shuo ^{[1
]}

Bi, Xiaojun ^{[2
,3
]}

Liu, Tao ^{[4
]}

Chen, Zheng ^{[2
,3
]}

机构：

[1] Harbin Engn Univ, Coll Informat & Commun Engn, Harbin 150001, Peoples R China

[2] Minzu Univ China, Key Lab Ethn Language Intelligent Anal & Secur Gov, Beijing 100086, Peoples R China

[3] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China

[4] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin 150001, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Machine translation; Data augmentation; Data models; Computational modeling; Training data; Estimation; Training; information entropy; machine translation; pseudo label; quality estimation;

D O I：

10.1109/TASLP.2024.3380996

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Machine translation quality estimation (QE) refers to the quality assessment of machine translations without a given reference translation. Supervised QE models based on neural networks have achieved state-of-the-art results. But this method requires large-scale training data, which requires bilingual experts to create high-quality labels. This is often very costly. Therefore, we propose a sentence-level machine translation QE data augmentation method based on information dropping. Firstly, we calculate the subwords information of the target translation based on the conditional language model. Subsequently, some subwords in the target translation are randomly deleted or replaced. We obtain the pseudo quality score by calculating the remaining information. Finally, the original and augmented data are combined to train the final model. This pseudo-data generation method based on information dropping strategy enables us to obtain more faithful and diverse training samples without requiring additional corpus resources. Experimental results show that we improve the correlation with human judgment by an average of 5.96% in the seven translation directions of the MLQE-PE dataset, while improving the model's robustness to low adequacy samples. In addition, the method does not require any modifications to the model architecture.

引用

页码：2112 / 2124

页数：13

共 50 条

[1] Target Oriented Data Generation for Quality Estimation of Machine Translation
Wu, Huanqin
Yang, Muyun
Wang, Jiaqi
Zhu, Junguo
Zhao, Tiejun
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 393 - 405
[2] A Data Augmentation Method for English-Vietnamese Neural Machine Translation
Pham, Nghia Luan
Nguyen, Van Vinh
Pham, Thang Viet
IEEE ACCESS, 2023, 11 : 28034 - 28044
[3] An Oblivious Approach to Machine Translation Quality Estimation
Elmakias, Itamar
Vilenchik, Dan
MATHEMATICS, 2021, 9 (17)
[4] Enhancing machine translation with quality estimation and reinforcement learning
Yang, Zijian Gyozo
Laki, Laszlo Janos
ANNALES MATHEMATICAE ET INFORMATICAE, 2023, 58 : 182 - 190
[5] Dimensionality reduction methods for machine translation quality estimation
Gonzalez-Rubio, Jesus
Ramon Navarro-Cerdan, J.
Casacuberta, Francisco
MACHINE TRANSLATION, 2013, 27 (3-4) : 281 - 301
[6] Improving Data Driven Inverse Text Normalization using Data Augmentation and Machine Translation
Paul, Debjyoti
Pang, Yutong
Chen, Szu-Jui
Zhang, Xuedong
INTERSPEECH 2022, 2022, : 5221 - 5222
[7] QUALES: Machine Translation Quality Estimation via Supervised and Unsupervised Machine Learning
Etchegoyhen, Thierry
Martinez Garcia, Eva
Azpeitia, Andoni
Alegria, Inaki
Labaka, Gorka
Otegi, Arantza
Sarasola, Kepa
Cortes, Itziar
Jauregi, Amaia
Ellakuria, Igor
Calonge, Eusebi
Martin, Maite
PROCESAMIENTO DEL LENGUAJE NATURAL, 2018, (61): : 143 - 146
[8] Learning Bilingual Sentence Representations for Quality Estimation of Machine Translation
Zhu, Junguo
Yang, Muyun
Li, Sheng
Zhao, Tiejun
MACHINE TRANSLATION, 2016, 668 : 35 - 42
[9] Word-Level Quality Estimation for Korean-English Neural Machine Translation
Eo, Sugyeong
Park, Chanjun
Moon, Hyeonseok
Seo, Jaehyung
Lim, Heuiseok
IEEE ACCESS, 2022, 10 : 44964 - 44973
[10] Neural approach-based quality estimation in improving translation of English to Hindi using machine translation under data science
Chouhan, Mansi
Srivastava, Devesh Kumar
2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021, : 35 - 39

← 1 2 3 4 5 →