SimMix: Local similarity-aware data augmentation for time series

被引:0
作者
Liu, Pin [1 ,3 ]
Guo, Yuxuan [2 ]
Chen, Pengpeng [2 ]
Chen, Zhijun [3 ]
Wang, Rui [3 ]
Wang, Yuzhu [1 ]
Shi, Bin [4 ]
机构
[1] China Univ Geosci, Sch Informat Engn, Beijing 100083, Peoples R China
[2] Aviat Syst Engn Inst China ASEI, Beijing 100048, Peoples R China
[3] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
[4] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian 710049, Peoples R China
关键词
Time series; Data augmentation; Local similarity; Classification; Augmentation intensity; NEURAL-NETWORK;
D O I
10.1016/j.eswa.2024.124793
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We find that local similarity is an essential factor for data augmentation in deep learning tasks concerning time series data, the applications of which are prevalent in various domains such as smart healthcare, intelligent transportation, smart finance, etc. With empirical and theoretical analysis, we find deep learning models achieve excellent performance only when the data augmentation method performs with appropriate intensity of local similarity-during the data augmentation process, too large/small intra-class local similarity will decrease the performance of deep learning models. With this discovery, we propose a time series augmentation method based on intra-class Sim ilarity Mix ing (SimMix), which accurately controls the intensity by quantifying and adjusting the similarity between augmented samples and original samples. With a PAC (i.e., Probably Approximately Correct) theoretical foundation, we design a cutmix strategy for non-equal length segments to eliminate semantic information loss and noise introduction defects in traditional methods. Through extensive validation on 10 real-world datasets, we demonstrate that the proposed method can outperform the state-of-the-art by a large margin.
引用
收藏
页数:10
相关论文
共 46 条
[1]  
Agarwal Sameer, 2007, P MACHINE LEARNING R, P11
[2]  
Akyash M., 2021, arXiv, DOI DOI 10.48550/ARXIV.2103.01119
[3]   Adaptive boost LS-SVM classification approach for time-series signal classification in epileptic seizure diagnosis applications [J].
Al-Hadeethi, Hanan ;
Abdulla, Shahab ;
Diykh, Mohammed ;
Deo, Ravinesh C. ;
Green, Jonathan H. .
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 161
[4]   A convolutional neural network for fast upsampling of undersampled tomograms in X-ray CT time-series using a representative highly sampled tomogram [J].
Bellos, Dimitrios ;
Basham, Mark ;
Pridmore, Tony ;
French, Andrew P. .
JOURNAL OF SYNCHROTRON RADIATION, 2019, 26 :839-853
[5]   Bagging exponential smoothing methods using STL decomposition and Box-Cox transformation [J].
Bergmeir, Christoph ;
Hyndman, Rob J. ;
Benitez, Jose M. .
INTERNATIONAL JOURNAL OF FORECASTING, 2016, 32 (02) :303-312
[6]   A Parsimonious Mixture of Gaussian Trees Model for Oversampling in Imbalanced and Multimodal Time-Series Classification [J].
Cao, Hong ;
Tan, Vincent Y. F. ;
Pang, John Z. F. .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (12) :2226-2239
[7]   A novel data augmentation method to enhance deep neural networks for detection of atrial fibrillation [J].
Cao, Ping ;
Li, Xinyi ;
Mao, Kedong ;
Lu, Fei ;
Ning, Gangmin ;
Fang, Luping ;
Pan, Qing .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 56
[8]  
Cho D., 2022, Advances in Neural Information Processing Systems, V35, P11534
[9]   Data Augmentation for Deep Neural Network Acoustic Modeling [J].
Cui, Xiaodong ;
Goel, Vaibhava ;
Kingsbury, Brian .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (09) :1469-1477
[10]   Data augmentation using a combination of independent component analysis and non-linear time-series prediction [J].
Eltoft, T .
PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, :448-453