MOGT: OVERSAMPLING WITH A PARSIMONIOUS MIXTURE OF GAUSSIAN TREES MODEL FOR IMBALANCED TIME-SERIES CLASSIFICATION

被引:1
作者
Pang, John Z. F. [1 ]
Cao, Hong [2 ]
Tan, Vincent Y. F. [2 ,3 ]
机构
[1] Nanyang Technol Univ, Sch Phys & Math Sci, Nanyang, Singapore
[2] ASTAR, Inst Infocomm Res I2R, Dept Anal Dept, Singapore, Singapore
[3] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
来源
2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) | 2013年
关键词
Imbalanced dataset; Time-series; Oversampling; Gaussian graphical models; Mixture models; Multi-modality; LEARNING GRAPHICAL MODELS; DATA SETS; SMOTE;
D O I
10.1109/MLSP.2013.6661937
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We propose a novel framework of using a parsimonious statistical model, known as mixture of Gaussian trees, for modelling the possibly multi-modal minority class to solve the problem of imbalanced time-series binary classification. By exploiting the fact that close-by time points are highly correlated, our model significantly reduces the number of covariance parameters to be estimated from O(d(2)) to O(Ld), L denotes the number of mixture components and d is the dimension. Thus our model is particularly effective for modelling high-dimensional time-series with limited number of instances in the minority positive class. We conduct extensive classification experiments based on several well-known time-series datasets (both single- and multi-modal) by first randomly generating synthetic instances from our learned mixture model to correct the imbalance. We then compare our results to several state-of-the-art oversampling techniques and the results demonstrate that when our proposed model is used, the same support vector machines classifier achieves much better classification accuracy across the range of datasets. In fact, the proposed method achieves the best average performance 27 times out of 30 multi-modal datasets according to the F-value metric.
引用
收藏
页数:6
相关论文
共 31 条
[1]  
[Anonymous], 2006, Elements of Information Theory
[2]  
[Anonymous], 2004, ACM SIGKDD EXPLORATI, DOI DOI 10.1145/1007730.1007737
[3]   Learning graphical models for stationary time series [J].
Bach, FR ;
Jordan, MI .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (08) :2189-2199
[4]  
Batista G. E., 2004, ACM SIGKDD Explor. Newslett., P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[5]  
Bishop C.M., 2008, Pattern Recognition and Machine Learning: A Matlab Companion
[6]  
Cao H., 2013, IEEE T KNOWLEDGE DAT
[7]   Manipulation Detection on Image Patches Using FusionBoost [J].
Cao, Hong ;
Kot, Alex C. .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2012, 7 (03) :992-1002
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]   APPROXIMATING DISCRETE PROBABILITY DISTRIBUTIONS WITH DEPENDENCE TREES [J].
CHOW, CK ;
LIU, CN .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1968, 14 (03) :462-+
[10]  
Cormen T. H., 2003, INTRO ALGORITHMS