A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability

被引:0
作者
Cao, Chengtai [1 ]
Zhou, Fan [2 ]
Dai, Yurou [1 ]
Wang, Jianping [1 ]
Zhang, Kunpeng [3 ]
机构
[1] City Univ Hong Kong, Hong Kong, Peoples R China
[2] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[3] Univ Maryland Coll Pk, College Pk, MD USA
基金
中国国家自然科学基金;
关键词
Data augmentation; regularization; generalization; machine learning; deep learning;
D O I
10.1145/3696206
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data augmentation (DA) is indispensable in modern machine learning and deep neural networks. The basic idea of DA is to construct new training data to improve the model's generalization by adding slightly disturbed versions of existing data or synthesizing new data. This survey comprehensively reviews a crucial subset of DA techniques, namely Mix-based Data Augmentation (MixDA), which generates novel samples by combining multiple examples. In contrast to traditional DA approaches that operate on single samples or entire datasets, MixDA stands out due to its effectiveness, simplicity, computational efficiency, theoretical foundation, and broad applicability. We begin by introducing a novel taxonomy that categorizes MixDA into Mixup-based, Cutmix-based, and mixture approaches based on a hierarchical perspective of the data mixing operation. Subsequently, we provide an in-depth review of various MixDA techniques, focusing on their underlying motivations. Owing to its versatility, MixDA has penetrated a wide range of applications, which we also thoroughly investigate in this survey. Moreover, we delve into the underlying mechanisms of MixDA's effectiveness by examining its impact on model generalization and calibration while providing insights into the model's behavior by analyzing the inherent properties of MixDA. Finally, we recapitulate the critical findings and fundamental challenges of current MixDA studies while outlining the potential directions for future works. Different from previous related surveys that focus on DA approaches in specific domains (e.g., computer vision and natural language processing) or only review a limited subset of MixDA studies, we are the first to provide a systematical survey of MixDA, covering its taxonomy, methodology, application, and explainability. Furthermore, we provide promising directions for researchers interested in this exciting area.
引用
收藏
页数:38
相关论文
共 50 条
  • [21] A Survey on GAN-Based Data Augmentation for Hand Pose Estimation Problem
    Farahanipad, Farnaz
    Rezaei, Mohammad
    Nasr, Mohammad Sadegh
    Kamangar, Farhad
    Athitsos, Vassilis
    TECHNOLOGIES, 2022, 10 (02)
  • [22] Survey on Style in 3D Human Body Motion: Taxonomy, Data, Recognition and Its Applications
    Ribet, Sarah
    Wannous, Hazem
    Vandeborre, Jean-Philippe
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (04) : 928 - 948
  • [23] Token replacement-based data augmentation methods for hate speech detection
    Madukwe, Kosisochukwu Judith
    Gao, Xiaoying
    Xue, Bing
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2022, 25 (03): : 1129 - 1150
  • [24] POLICY BASED SYNTHESIS: DATA GENERATION AND AUGMENTATION METHODS FOR RF MACHINE LEARNING
    Miller, Robert D.
    Kokalj-Filipovic, Silvija
    Vanhoy, Garrett
    Morman, Joshua
    2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
  • [25] Applications of data augmentation in mineral prospectivity prediction based on convolutional neural networks
    Yang, Na
    Zhang, Zhenkai
    Yang, Jianhua
    Hong, Zenglin
    COMPUTERS & GEOSCIENCES, 2022, 161
  • [26] Fractional-Order Calculus-Based Data Augmentation Methods for Environmental Sound Classification with Deep Learning
    Yazgac, Bilgi Gorkem
    Kirci, Murvet
    FRACTAL AND FRACTIONAL, 2022, 6 (10)
  • [27] On the analysis of data augmentation methods for spectral imaged based heart sound classification using convolutional neural networks
    Zhou, George
    Chen, Yunchan
    Chien, Candace
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01) : 226
  • [28] On the analysis of data augmentation methods for spectral imaged based heart sound classification using convolutional neural networks
    George Zhou
    Yunchan Chen
    Candace Chien
    BMC Medical Informatics and Decision Making, 22
  • [29] Token replacement-based data augmentation methods for hate speech detection
    Kosisochukwu Judith Madukwe
    Xiaoying Gao
    Bing Xue
    World Wide Web, 2022, 25 : 1129 - 1150
  • [30] Methods for Improving Deep Learning-Based Cardiac Auscultation Accuracy: Data Augmentation and Data Generalization
    Jeong, Yoojin
    Kim, Juhee
    Kim, Daeyeol
    Kim, Jinsoo
    Lee, Kwangkee
    APPLIED SCIENCES-BASEL, 2021, 11 (10):