Cluster-based Weighted Oversampling for Ordinal Regression (CWOS-Ord)

被引:11
作者
Nekooeimehr, Iman [1 ]
Lai-Yuen, Susana K. [1 ]
机构
[1] Univ S Florida, Ind & Management Syst Engn, 4202 East Fowler Ave,ENB 118, Tampa, FL 33620 USA
关键词
Imbalanced dataset; Ordinal regression; Clustering; Oversampling; CLASSIFICATION; IMBALANCE; ALGORITHM; SMOTE;
D O I
10.1016/j.neucom.2016.08.071
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A new oversampling method called Cluster-based Weighted Oversampling for Ordinal Regression (CWOS-Ord) is proposed for addressing ordinal regression with unbalanced datasets. Ordinal regression is a supervised approach for learning the ordinal relationship between classes. In many applications, the dataset is highly imbalanced where the instances of some classes (majority classes) occur much more frequently than instances of other classes (minority classes). This significantly degrades the classification performance as classifiers tend to strongly favor the majority classes. Standard oversampling methods can be used to improve the dataset class distribution; however, they do not consider the ordinal relationship between the classes. The proposed CWOS-Ord method aims to address this problem by first clustering minority classes and then oversampling them based on their distances and ordering relationship to other classes' instances. The final size to oversample the clusters depends on their complexity and their initial size so that more synthetic instances are generated for more complex and smaller clusters while fewer instances are generated for less complex and larger clusters. As a secondary contribution, existing oversampling methods for two-class classification have been extended for ordinal regression. Results demonstrate that the proposed CWOS-Ord method provides significantly better results compared to other methods based on the performance measures. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:51 / 60
页数:10
相关论文
共 42 条
[21]   A corporate credit rating model using multi-class support vector machines with an ordinal pairwise partitioning approach [J].
Kim, Kyoung-jae ;
Ahn, Hyunchul .
COMPUTERS & OPERATIONS RESEARCH, 2012, 39 (08) :1800-1811
[22]  
Li L., 2006, Advances in neural information processing systems, V19, P865
[23]  
Liao T.W., 2008, EXPERT SYST APPL, P35
[24]   Exploratory Undersampling for Class-Imbalance Learning [J].
Liu, Xu-Ying ;
Wu, Jianxin ;
Zhou, Zhi-Hua .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2009, 39 (02) :539-550
[25]   An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics [J].
Lopez, Victoria ;
Fernandez, Alberto ;
Garcia, Salvador ;
Palade, Vasile ;
Herrera, Francisco .
INFORMATION SCIENCES, 2013, 250 :113-141
[26]   Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets [J].
Nekooeimehr, Iman ;
Lai-Yuen, Susana K. .
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 46 :405-416
[27]   Kernelising the Proportional Odds Model through kernel learning techniques [J].
Perez-Ortiz, M. ;
Gutierrez, P. A. ;
Cruz-Ramírez, M. ;
Sanchez-Monedero, J. ;
Hervas-Martinez, C. .
NEUROCOMPUTING, 2015, 164 :23-33
[28]   Graph-Based Approaches for Over-Sampling in the Context of Ordinal Regression [J].
Perez-Ortiz, Maria ;
Antonio Gutierrez, Pedro ;
Hervas-Martinez, Cesar ;
Yao, Xin .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (05) :1233-1245
[29]   A resampling ensemble algorithm for classification of imbalance problems [J].
Qian, Yun ;
Liang, Yanchun ;
Li, Mu ;
Feng, Guoxiang ;
Shi, Xiaohu .
NEUROCOMPUTING, 2014, 143 :57-67
[30]   A granular computing-based approach to credit scoring modeling [J].
Saberi, Morteza ;
Mirtalaie, Monireh Sadat ;
Hussain, Farookh Khadeer ;
Azadeh, Ali ;
Hussain, Omar Khadeer ;
Ashjari, Behzad .
NEUROCOMPUTING, 2013, 122 :100-115