Decision Support Model for Time Series Data Augmentation Method Selection

被引:0
作者
Joubaud, Dorian [1 ]
Kubler, Sylvain [1 ]
Lourenco, Raoni [1 ]
Cordy, Maxime [1 ]
Le Traon, Yves [1 ]
机构
[1] Univ Luxembourg, Interdisciplinary Ctr Secur Reliabil & Trust SnT, L-1855 Kirchberg, Luxembourg
关键词
Feature extraction; Benchmark testing; Data models; Data augmentation; Artificial neural networks; Computer architecture; Computational modeling; Complexity theory; Adaptation models; Synthetic data; Imbalanced time-series classification; data augmentation; oversampling; synthetic data; machine learning; artificial intelligence; PERFORMANCE;
D O I
10.1109/ACCESS.2024.3516369
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data augmentation (DA) plays a crucial role in machine learning by improving model generalization and tackling data scarcity issues, particularly prevalent in domains with limited access to sensitive information or rare events. Despite the availability of various DA techniques for handling imbalanced time-series classification (ITSC) problems, there is a lack of comprehensive guidelines for selecting the most appropriate technique based on input data features and the chosen classifier. This paper empirically demonstrates the limitations of conventional data balancing practices through experiments conducted on 720 ITSC datasets, using 7 classifier architectures and 6 DA techniques (TimeGAN, SMOTE, ADASYN, Random Oversampling, Jittering, Time Warping). Our study not only explores the relationship between DA techniques and the inherent characteristics of ITSC datasets and classifiers but also introduces a novel ML-based decision support system, BALANCER (imBALanced AugmeNtation reCommendER), which has been trained based on empirical data to offer an automated approach for ML practitioners to select the most appropriate DA method for their own/specific application. BALANCER's recommendation model comes with a prediction of the performance enhancement that is expected from data balancing using the recommended method. Evaluation of BALANCER against traditional mean rank recommendations reveals significant improvements, with BALANCER achieving an average Kendall's tau of 0.36 (compared to -0.01 for traditional mean rank recommendations) and a root mean square error of $1.5\times 10<^>{-2}$ on individual predictions. The reasons behind the notable disparity in results between the mean rank recommendation strategy and BALANCER are analyzed using eXplainable AI (XAI), demonstrating that BALANCER can uncover deeper and more complex feature interactions compared to a mean rank recommendation-like strategy.
引用
收藏
页码:196553 / 196566
页数:14
相关论文
共 38 条
[1]   To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques [J].
Abdi, Lida ;
Hashemi, Sattar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) :238-251
[2]   An Investigation of SMOTE Based Methods for Imbalanced Datasets With Data Complexity Analysis [J].
Azhar, Nur Athirah ;
Pozi, Muhammad Syafiq Mohd ;
Din, Aniza Mohamed ;
Jatowt, Adam .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) :6651-6672
[3]   Improving the accuracy of global forecasting models using time series data augmentation [J].
Bandara, Kasun ;
Hewamalage, Hansika ;
Liu, Yuan-Hao ;
Kang, Yanfei ;
Bergmeir, Christoph .
PATTERN RECOGNITION, 2021, 120
[4]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[5]   Deep Neural Networks and Tabular Data: A Survey [J].
Borisov, Vadim ;
Leemann, Tobias ;
Sessler, Kathrin ;
Haug, Johannes ;
Pawelczyk, Martin ;
Kasneci, Gjergji .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (06) :7499-7519
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[8]   Cost-Sensitive Online Adaptive Kernel Learning for Large-Scale Imbalanced Classification [J].
Chen, Yingying ;
Hong, Zijie ;
Yang, Xiaowei .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (10) :10554-10568
[9]   Generative Adversarial Networks An overview [J].
Creswell, Antonia ;
White, Tom ;
Dumoulin, Vincent ;
Arulkumaran, Kai ;
Sengupta, Biswa ;
Bharath, Anil A. .
IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (01) :53-65
[10]   AutoAugment: Learning Augmentation Strategies from Data [J].
Cubuk, Ekin D. ;
Zoph, Barret ;
Mane, Dandelion ;
Vasudevan, Vijay ;
Le, Quoc V. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :113-123