A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability

被引:0
|
作者
Cao, Chengtai [1 ]
Zhou, Fan [2 ]
Dai, Yurou [1 ]
Wang, Jianping [1 ]
Zhang, Kunpeng [3 ]
机构
[1] City Univ Hong Kong, Hong Kong, Peoples R China
[2] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[3] Univ Maryland Coll Pk, College Pk, MD USA
基金
中国国家自然科学基金;
关键词
Data augmentation; regularization; generalization; machine learning; deep learning;
D O I
10.1145/3696206
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data augmentation (DA) is indispensable in modern machine learning and deep neural networks. The basic idea of DA is to construct new training data to improve the model's generalization by adding slightly disturbed versions of existing data or synthesizing new data. This survey comprehensively reviews a crucial subset of DA techniques, namely Mix-based Data Augmentation (MixDA), which generates novel samples by combining multiple examples. In contrast to traditional DA approaches that operate on single samples or entire datasets, MixDA stands out due to its effectiveness, simplicity, computational efficiency, theoretical foundation, and broad applicability. We begin by introducing a novel taxonomy that categorizes MixDA into Mixup-based, Cutmix-based, and mixture approaches based on a hierarchical perspective of the data mixing operation. Subsequently, we provide an in-depth review of various MixDA techniques, focusing on their underlying motivations. Owing to its versatility, MixDA has penetrated a wide range of applications, which we also thoroughly investigate in this survey. Moreover, we delve into the underlying mechanisms of MixDA's effectiveness by examining its impact on model generalization and calibration while providing insights into the model's behavior by analyzing the inherent properties of MixDA. Finally, we recapitulate the critical findings and fundamental challenges of current MixDA studies while outlining the potential directions for future works. Different from previous related surveys that focus on DA approaches in specific domains (e.g., computer vision and natural language processing) or only review a limited subset of MixDA studies, we are the first to provide a systematical survey of MixDA, covering its taxonomy, methodology, application, and explainability. Furthermore, we provide promising directions for researchers interested in this exciting area.
引用
收藏
页数:38
相关论文
共 50 条
  • [1] Explainability-Based Mix-Up Approach for Text Data Augmentation
    Kwon, Soonki
    Lee, Younghoon
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2023, 17 (01)
  • [2] Domain Generalization Through Data Augmentation: A Survey of Methods, Applications, and Challenges
    Mai, Junjie
    Gao, Chongzhi
    Bao, Jun
    MATHEMATICS, 2025, 13 (05)
  • [3] Explainability of artificial intelligence methods, applications and challenges: A comprehensive survey
    Ding, Weiping
    Abdel-Basset, Mohamed
    Hawash, Hossam
    Ali, Ahmed M.
    INFORMATION SCIENCES, 2022, 615 : 238 - 292
  • [4] Privacy-Preserving DNS: Analysis of Broadcast, Range Queries and Mix-Based Protection Methods
    Federrath, Hannes
    Fuchs, Karl-Peter
    Herrmann, Dominik
    Piosecny, Christopher
    COMPUTER SECURITY - ESORICS 2011, 2011, 6879 : 665 - +
  • [5] Data Augmentation techniques in time series domain: a survey and taxonomy
    Iglesias, Guillermo
    Talavera, Edgar
    Gonzalez-Prieto, Angel
    Mozo, Alberto
    Gomez-Canaval, Sandra
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (14): : 10123 - 10145
  • [6] Data Augmentation techniques in time series domain: a survey and taxonomy
    Guillermo Iglesias
    Edgar Talavera
    Ángel González-Prieto
    Alberto Mozo
    Sandra Gómez-Canaval
    Neural Computing and Applications, 2023, 35 : 10123 - 10145
  • [7] Survey on Learning-Based Formal Methods: Taxonomy, Applications and Possible Future Directions
    Wang, Fujun
    Cao, Zining
    Tan, Lixing
    Zong, Hui
    IEEE ACCESS, 2020, 8 : 108561 - 108578
  • [8] A Survey of Automated Data Augmentation for Image Classification: Learning to Compose, Mix, and Generate
    Cheung, Tsz-Him
    Yeung, Dit-Yan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 13185 - 13205
  • [9] A Survey of Synthetic Data Augmentation Methods in Machine Vision
    Mumuni, Alhassan
    Mumuni, Fuseini
    Gerrar, Nana Kobina
    MACHINE INTELLIGENCE RESEARCH, 2024, 21 (05) : 831 - 869
  • [10] Differentiable Image Data Augmentation and Its Applications: A Survey
    Shi, Jian
    Ghazzai, Hakim
    Massoud, Yehia
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (02) : 1148 - 1164