A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability

被引：0

作者：

Cao, Chengtai ^{[1
]}

Zhou, Fan ^{[2
]}

Dai, Yurou ^{[1
]}

Wang, Jianping ^{[1
]}

Zhang, Kunpeng ^{[3
]}

机构：

[1] City Univ Hong Kong, Hong Kong, Peoples R China

[2] Univ Elect Sci & Technol China, Chengdu, Peoples R China

[3] Univ Maryland Coll Pk, College Pk, MD USA

来源：

ACM COMPUTING SURVEYS | 2025年 / 57卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Data augmentation; regularization; generalization; machine learning; deep learning;

D O I：

10.1145/3696206

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Data augmentation (DA) is indispensable in modern machine learning and deep neural networks. The basic idea of DA is to construct new training data to improve the model's generalization by adding slightly disturbed versions of existing data or synthesizing new data. This survey comprehensively reviews a crucial subset of DA techniques, namely Mix-based Data Augmentation (MixDA), which generates novel samples by combining multiple examples. In contrast to traditional DA approaches that operate on single samples or entire datasets, MixDA stands out due to its effectiveness, simplicity, computational efficiency, theoretical foundation, and broad applicability. We begin by introducing a novel taxonomy that categorizes MixDA into Mixup-based, Cutmix-based, and mixture approaches based on a hierarchical perspective of the data mixing operation. Subsequently, we provide an in-depth review of various MixDA techniques, focusing on their underlying motivations. Owing to its versatility, MixDA has penetrated a wide range of applications, which we also thoroughly investigate in this survey. Moreover, we delve into the underlying mechanisms of MixDA's effectiveness by examining its impact on model generalization and calibration while providing insights into the model's behavior by analyzing the inherent properties of MixDA. Finally, we recapitulate the critical findings and fundamental challenges of current MixDA studies while outlining the potential directions for future works. Different from previous related surveys that focus on DA approaches in specific domains (e.g., computer vision and natural language processing) or only review a limited subset of MixDA studies, we are the first to provide a systematical survey of MixDA, covering its taxonomy, methodology, application, and explainability. Furthermore, we provide promising directions for researchers interested in this exciting area.

引用

页数：38

共 50 条

[21] A Survey on GAN-Based Data Augmentation for Hand Pose Estimation Problem
Farahanipad, Farnaz
Rezaei, Mohammad
Nasr, Mohammad Sadegh
Kamangar, Farhad
Athitsos, Vassilis
TECHNOLOGIES, 2022, 10 (02)
[22] Survey on Style in 3D Human Body Motion: Taxonomy, Data, Recognition and Its Applications
Ribet, Sarah
Wannous, Hazem
Vandeborre, Jean-Philippe
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (04) : 928 - 948
[23] Token replacement-based data augmentation methods for hate speech detection
Madukwe, Kosisochukwu Judith
Gao, Xiaoying
Xue, Bing
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2022, 25 (03): : 1129 - 1150
[24] POLICY BASED SYNTHESIS: DATA GENERATION AND AUGMENTATION METHODS FOR RF MACHINE LEARNING
Miller, Robert D.
Kokalj-Filipovic, Silvija
Vanhoy, Garrett
Morman, Joshua
2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
[25] Applications of data augmentation in mineral prospectivity prediction based on convolutional neural networks
Yang, Na
Zhang, Zhenkai
Yang, Jianhua
Hong, Zenglin
COMPUTERS & GEOSCIENCES, 2022, 161
[26] Fractional-Order Calculus-Based Data Augmentation Methods for Environmental Sound Classification with Deep Learning
Yazgac, Bilgi Gorkem
Kirci, Murvet
FRACTAL AND FRACTIONAL, 2022, 6 (10)
[27] On the analysis of data augmentation methods for spectral imaged based heart sound classification using convolutional neural networks
Zhou, George
Chen, Yunchan
Chien, Candace
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01) : 226
[28] On the analysis of data augmentation methods for spectral imaged based heart sound classification using convolutional neural networks
George Zhou
Yunchan Chen
Candace Chien
BMC Medical Informatics and Decision Making, 22
[29] Token replacement-based data augmentation methods for hate speech detection
Kosisochukwu Judith Madukwe
Xiaoying Gao
Bing Xue
World Wide Web, 2022, 25 : 1129 - 1150
[30] Methods for Improving Deep Learning-Based Cardiac Auscultation Accuracy: Data Augmentation and Data Generalization
Jeong, Yoojin
Kim, Juhee
Kim, Daeyeol
Kim, Jinsoo
Lee, Kwangkee
APPLIED SCIENCES-BASEL, 2021, 11 (10):

← 1 2 3 4 5 →