Multi-Exemplar based Clustering for Imbalanced Data

被引:0
作者
Wang, Yangtao [1 ]
Chen, Lihui [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore
来源
2014 13TH INTERNATIONAL CONFERENCE ON CONTROL AUTOMATION ROBOTICS & VISION (ICARCV) | 2014年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering is an important unsupervised technique of data analysis to find the underlining information of the unlabelled data. Many clustering approaches have been developed and reported in the literature and some of them are widely applied in real world problems such as k-means and fuzzy k-means. However, when handling imbalanced data in which the classes have very different sizes, the performance of these algorithms may not be very effective. The results of these algorithms always generate clusters with similar sizes which is called "uniform effect". To prevent uniform effect and improve the clustering performance, we proposed a new approach called multi-exemplar merging clustering(MEMC) for imbalanced data in this paper. Our approach is composed of two stages of processing: multiple exemplars identification stage and exemplars merging stage. In the first stage, multiple exemplars which are the data objects selected to represent the data set are identified using MEAP. In the second stage, the exemplars are merged based on the proposed overlapping measure(OM) which reflects the degree of overlapping between clusters. Experimental results on several synthetic and real world data sets are conducted to show the effectiveness of our proposed approach on imbalanced data clustering.
引用
收藏
页码:1068 / 1073
页数:6
相关论文
共 17 条
[1]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[2]  
Bezdek J. C., 1981, Pattern recognition with fuzzy objective function algorithms
[3]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[4]  
Dueck D, 2008, LECT N BIOINFORMAT, V4955, P360
[5]   A survey of kernel and spectral methods for clustering [J].
Filippone, Maurizio ;
Camastra, Francesco ;
Masulli, Francesco ;
Rovetta, Stefano .
PATTERN RECOGNITION, 2008, 41 (01) :176-190
[6]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284
[7]   COMPARING PARTITIONS [J].
HUBERT, L ;
ARABIE, P .
JOURNAL OF CLASSIFICATION, 1985, 2 (2-3) :193-218
[8]   Data clustering: 50 years beyond K-means [J].
Jain, Anil K. .
PATTERN RECOGNITION LETTERS, 2010, 31 (08) :651-666
[9]  
Liang J., 2012, FUZZY SYSTEMS IEEE T, V20
[10]  
MacQueen J., 1967, P 5 BERK S MATH STAT, V1, P281, DOI DOI 10.1007/S11665-016-2173-6