Improving SVM classification on imbalanced time series data sets with ghost points

被引:1
作者
Suzan Köknar-Tezel
Longin Jan Latecki
机构
[1] Temple University,Department of Computer and Information Sciences
来源
Knowledge and Information Systems | 2011年 / 28卷
关键词
Imbalanced data sets; Support Vector Machines; Time series;
D O I
暂无
中图分类号
学科分类号
摘要
Imbalanced data sets present a particular challenge to the data mining community. Often, it is the rare event that is of interest and the cost of misclassifying the rare event is higher than misclassifying the usual event. When the data is highly skewed toward the usual, it can be very difficult for a learning system to accurately detect the rare event. There have been many approaches in recent years for handling imbalanced data sets, from under-sampling the majority class to adding synthetic points to the minority class in feature space. However, distances between time series are known to be non-Euclidean and non-metric, since comparing time series requires warping in time. This fact makes it impossible to apply standard methods like SMOTE to insert synthetic data points in feature spaces. We present an innovative approach that augments the minority class by adding synthetic points in distance spaces. We then use Support Vector Machines for classification. Our experimental results on standard time series show that our synthetic points significantly improve the classification rate of the rare events, and in most cases also improves the overall accuracy of SVMs. We also show how adding our synthetic points can aid in the visualization of time series data sets.
引用
收藏
页码:1 / 23
页数:22
相关论文
共 31 条
[1]  
Aach J(2001)Aligning gene expression time series with time warping algorithms Bioinformatics 17 495-508
[2]  
Church GM(2004)A study of the behavior of several methods for balancing machine learning training data SIGKDD Explor. Newsl. 6 20-29
[3]  
Batista GEAPA(2002)Smote: synthetic minority over-sampling technique J Artif Intell Res 16 321-357
[4]  
Prati RC(2009)Computing and visualizing dynamic time warping alignments in R: the dtw package Journal of Statistical Software 31 1-24
[5]  
Monard MC(1998)Machine learning for the detection of oil spills in satellite radar images Machine Learning 30 195-215
[6]  
Chawla NV(2004)Feature discovery in non-metric pairwise data J Mach Learn Res 5 801-818
[7]  
Bowyer KW(2000)Nonlinear dimensionality reduction by locally linear embedding Science 290 2323-2326
[8]  
Kegelmeyer WP(1978)Dynamic programming algorithm optimization for spoken word recognition IEEE Trans Acoust Speech Signal Process 26 43-49
[9]  
Giorgino T(2000)A global geometric framework for nonlinear dimensionality reduction Science 290 2319-2323
[10]  
Kubat M(2004)Mining with rarity: a unifying framework SIGKDD Explor Newsl 6 7-19