Cluster-based Weighted Oversampling for Ordinal Regression (CWOS-Ord)

被引:11
作者
Nekooeimehr, Iman [1 ]
Lai-Yuen, Susana K. [1 ]
机构
[1] Univ S Florida, Ind & Management Syst Engn, 4202 East Fowler Ave,ENB 118, Tampa, FL 33620 USA
关键词
Imbalanced dataset; Ordinal regression; Clustering; Oversampling; CLASSIFICATION; IMBALANCE; ALGORITHM; SMOTE;
D O I
10.1016/j.neucom.2016.08.071
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A new oversampling method called Cluster-based Weighted Oversampling for Ordinal Regression (CWOS-Ord) is proposed for addressing ordinal regression with unbalanced datasets. Ordinal regression is a supervised approach for learning the ordinal relationship between classes. In many applications, the dataset is highly imbalanced where the instances of some classes (majority classes) occur much more frequently than instances of other classes (minority classes). This significantly degrades the classification performance as classifiers tend to strongly favor the majority classes. Standard oversampling methods can be used to improve the dataset class distribution; however, they do not consider the ordinal relationship between the classes. The proposed CWOS-Ord method aims to address this problem by first clustering minority classes and then oversampling them based on their distances and ordering relationship to other classes' instances. The final size to oversample the clusters depends on their complexity and their initial size so that more synthetic instances are generated for more complex and smaller clusters while fewer instances are generated for less complex and larger clusters. As a secondary contribution, existing oversampling methods for two-class classification have been extended for ordinal regression. Results demonstrate that the proposed CWOS-Ord method provides significantly better results compared to other methods based on the performance measures. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:51 / 60
页数:10
相关论文
共 42 条
[1]  
[Anonymous], 2004, ACM SIGKDD EXPLORATI, DOI DOI 10.1145/1007730.1007737
[2]   An experimental study on evolutionary fuzzy classifiers designed for managing imbalanced datasets [J].
Antonelli, Michela ;
Ducange, Pietro ;
Marcelloni, Francesco .
NEUROCOMPUTING, 2014, 146 :125-136
[3]   Evaluation Measures for Ordinal Regression [J].
Baccianella, Stefano ;
Esuli, Andrea ;
Sebastiani, Fabrizio .
2009 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2009, :283-287
[4]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[5]   Classifying imbalanced data sets using similarity based hierarchical decomposition [J].
Beyan, Cigdem ;
Fisher, Robert .
PATTERN RECOGNITION, 2015, 48 (05) :1653-1672
[6]   Automatically countering imbalance and its empirical relationship to cost [J].
Chawla, Nitesh V. ;
Cieslak, David A. ;
Hall, Lawrence O. ;
Joshi, Ajay .
DATA MINING AND KNOWLEDGE DISCOVERY, 2008, 17 (02) :225-252
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]  
Chu W., 2005, Proceedings of the 22nd international conference on Machine learning, P137
[9]   Metrics to guide a multi-objective evolutionary algorithm for ordinal classification [J].
Cruz-Ramirez, M. ;
Hervas-Martinez, C. ;
Sanchez-Monedero, J. ;
Gutierrez, P. A. .
NEUROCOMPUTING, 2014, 135 :21-31
[10]   EFFICIENT ALGORITHM FOR A COMPLETE LINK METHOD [J].
DEFAYS, D .
COMPUTER JOURNAL, 1977, 20 (04) :364-366