Information Dropping Data Augmentation for Machine Translation Quality Estimation

被引：1

作者：

Li, Shuo ^{[1
]}

Bi, Xiaojun ^{[2
,3
]}

Liu, Tao ^{[4
]}

Chen, Zheng ^{[2
,3
]}

机构：

[1] Harbin Engn Univ, Coll Informat & Commun Engn, Harbin 150001, Peoples R China

[2] Minzu Univ China, Key Lab Ethn Language Intelligent Anal & Secur Gov, Beijing 100086, Peoples R China

[3] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China

[4] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin 150001, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Machine translation; Data augmentation; Data models; Computational modeling; Training data; Estimation; Training; information entropy; machine translation; pseudo label; quality estimation;

D O I：

10.1109/TASLP.2024.3380996

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Machine translation quality estimation (QE) refers to the quality assessment of machine translations without a given reference translation. Supervised QE models based on neural networks have achieved state-of-the-art results. But this method requires large-scale training data, which requires bilingual experts to create high-quality labels. This is often very costly. Therefore, we propose a sentence-level machine translation QE data augmentation method based on information dropping. Firstly, we calculate the subwords information of the target translation based on the conditional language model. Subsequently, some subwords in the target translation are randomly deleted or replaced. We obtain the pseudo quality score by calculating the remaining information. Finally, the original and augmented data are combined to train the final model. This pseudo-data generation method based on information dropping strategy enables us to obtain more faithful and diverse training samples without requiring additional corpus resources. Experimental results show that we improve the correlation with human judgment by an average of 5.96% in the seven translation directions of the MLQE-PE dataset, while improving the model's robustness to low adequacy samples. In addition, the method does not require any modifications to the model architecture.

引用

页码：2112 / 2124

页数：13

共 50 条

[31] Implicit Semantic Data Augmentation for Hand Pose Estimation
Seo, Kyeongeun
Cho, Hyeonjoong
Choi, Daewoong
Park, Ju-Derk
IEEE ACCESS, 2022, 10 : 84680 - 84688
[32] Measuring the Impact of Spelling Errors on the Quality of Machine Translation
Galinskaya, Irina
Gusev, Valentin
Meshcheryakova, Elena
Shmatova, Mariya
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2683 - 2689
[33] Least Information Spectral GAN With Time-Series Data Augmentation for Industrial IoT
Seon, Joonho
Lee, Seongwoo
Sun, Young Ghyu
Kim, Soo Hyun
Kim, Dong In
Kim, Jin Young
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2025, 9 (01): : 757 - 769
[34] Improving Many-to-Many Neural Machine Translation via Selective and Aligned Online Data Augmentation
Zhang, Weitai
Dai, Lirong
Liu, Junhua
Wang, Shijin
APPLIED SCIENCES-BASEL, 2023, 13 (06):
[35] Quality Estimation for Machine Translation with Multi-granularity Interaction
Tian, Ke
Zhang, Jiajun
MACHINE TRANSLATION, CCMT 2020, 2020, 1328 : 55 - 65
[36] Paraphrase Based Data Augmentation For Chinese-English Medical Machine Translation
An Bo
Long Congjun
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (01) : 118 - 126
[37] A Survey of Orthographic Information in Machine Translation
Chakravarthi B.R.
Rani P.
Arcan M.
McCrae J.P.
SN Computer Science, 2021, 2 (4)
[38] Machine translation and fair access to information
Nurminen, Mary
Koponen, Maarit
TRANSLATION SPACES, 2020, 9 (01) : 150 - 169
[39] Incorporating Syntactic Knowledge in Neural Quality Estimation for Machine Translation
Ye, Na
Wang, Yuanyuan
Cai, Dongfeng
MACHINE TRANSLATION, CCMT 2019, 2019, 1104 : 23 - 34
[40] Low-Resource Translation Quality Estimation for Estonian
Yankovskaya, Elizaveta
Fishel, Mark
HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018, 2018, 307 : 175 - 182

← 1 2 3 4 5 →