A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability

被引:2
作者
Cao, Chengtai [1 ]
Zhou, Fan [2 ]
Dai, Yurou [1 ]
Wang, Jianping [1 ]
Zhang, Kunpeng [3 ]
机构
[1] City Univ Hong Kong, Hong Kong, Peoples R China
[2] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[3] Univ Maryland Coll Pk, College Pk, MD USA
基金
中国国家自然科学基金;
关键词
Data augmentation; regularization; generalization; machine learning; deep learning;
D O I
10.1145/3696206
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data augmentation (DA) is indispensable in modern machine learning and deep neural networks. The basic idea of DA is to construct new training data to improve the model's generalization by adding slightly disturbed versions of existing data or synthesizing new data. This survey comprehensively reviews a crucial subset of DA techniques, namely Mix-based Data Augmentation (MixDA), which generates novel samples by combining multiple examples. In contrast to traditional DA approaches that operate on single samples or entire datasets, MixDA stands out due to its effectiveness, simplicity, computational efficiency, theoretical foundation, and broad applicability. We begin by introducing a novel taxonomy that categorizes MixDA into Mixup-based, Cutmix-based, and mixture approaches based on a hierarchical perspective of the data mixing operation. Subsequently, we provide an in-depth review of various MixDA techniques, focusing on their underlying motivations. Owing to its versatility, MixDA has penetrated a wide range of applications, which we also thoroughly investigate in this survey. Moreover, we delve into the underlying mechanisms of MixDA's effectiveness by examining its impact on model generalization and calibration while providing insights into the model's behavior by analyzing the inherent properties of MixDA. Finally, we recapitulate the critical findings and fundamental challenges of current MixDA studies while outlining the potential directions for future works. Different from previous related surveys that focus on DA approaches in specific domains (e.g., computer vision and natural language processing) or only review a limited subset of MixDA studies, we are the first to provide a systematical survey of MixDA, covering its taxonomy, methodology, application, and explainability. Furthermore, we provide promising directions for researchers interested in this exciting area.
引用
收藏
页数:38
相关论文
共 50 条
[31]   On the analysis of data augmentation methods for spectral imaged based heart sound classification using convolutional neural networks [J].
George Zhou ;
Yunchan Chen ;
Candace Chien .
BMC Medical Informatics and Decision Making, 22
[32]   Methods for Improving Deep Learning-Based Cardiac Auscultation Accuracy: Data Augmentation and Data Generalization [J].
Jeong, Yoojin ;
Kim, Juhee ;
Kim, Daeyeol ;
Kim, Jinsoo ;
Lee, Kwangkee .
APPLIED SCIENCES-BASEL, 2021, 11 (10)
[33]   Methods for Adventitious Respiratory Sound Analyzing Applications Based on Smartphones: A Survey [J].
Tabatabaei, Seyed Amir Hossein ;
Fischer, Patrick ;
Schneider, Henning ;
Koehler, Ulrich ;
Gross, Volker ;
Sohrabi, Keywan .
IEEE REVIEWS IN BIOMEDICAL ENGINEERING, 2021, 14 :98-115
[34]   Exploring the Efficacy of Base Data Augmentation Methods in Deep Learning-Based Radiograph Classification of Knee Joint Osteoarthritis [J].
Prezja, Fabi ;
Annala, Leevi ;
Kiiskinen, Sampsa ;
Ojala, Timo .
ALGORITHMS, 2024, 17 (01)
[35]   Study on data augmentation methods for deep neural network-based audio tagging [J].
Kim, Bum-Jun ;
Moon, Hyeongi ;
Park, Sung-Wook ;
Park, Young Cheol .
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2018, 37 (06) :475-482
[36]   Controllable Image Super-Resolution by SOM based Data Augmentation and its Applications [J].
Nakao, Kosuke ;
Nobuhara, Hajime .
2022 JOINT 12TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS AND 23RD INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (SCIS&ISIS), 2022,
[37]   Data Augmentation for Deep Learning-Based Speech Reconstruction Using FOC-Based Methods [J].
Yazgac, Bilgi Gorkem ;
Kirci, Murvet .
FRACTAL AND FRACTIONAL, 2025, 9 (02)
[38]   How can Big Data and machine learning benefit environment and water management: a survey of methods, applications, and future directions [J].
Sun, Alexander Y. ;
Scanlon, Bridget R. .
ENVIRONMENTAL RESEARCH LETTERS, 2019, 14 (07)
[39]   A survey of automated data augmentation algorithms for deep learning-based image classification tasks [J].
Zihan Yang ;
Richard O. Sinnott ;
James Bailey ;
Qiuhong Ke .
Knowledge and Information Systems, 2023, 65 :2805-2861
[40]   A survey of automated data augmentation algorithms for deep learning-based image classification tasks [J].
Yang, Zihan ;
Sinnott, Richard O. ;
Bailey, James ;
Ke, Qiuhong .
KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (07) :2805-2861