Large cost-sensitive margin distribution machine for imbalanced data classification

被引:30
作者
Cheng, Fanyong [1 ,2 ]
Zhang, Jing [1 ]
Wen, Cuihong [1 ]
Liu, Zhaohua [3 ]
Li, Zuoyong [2 ]
机构
[1] Hunan Univ, Coll Elect & Informat Engn, Lushan South Rd, Changsha 410082, Hunan, Peoples R China
[2] Minjiang Univ, Dept Comp Sci, Fujian Prov Key Lab Informat Proc & Intelligent C, Fuzhou 350121, Peoples R China
[3] Hunan Univ Sci & Technol, Sch Informat & Elect Engn, Xiangtan 411201, Peoples R China
基金
中国国家自然科学基金;
关键词
Margin distribution; Cost-sensitive learning; Imbalanced training data; Balanced detection rate;
D O I
10.1016/j.neucom.2016.10.053
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper develops cost-sensitive margin distribution learning and proposes Large Cost-Sensitive margin Distribution Machine (LCSDM) to get balanced detection rate on imbalanced training data. Recently, margin theory revealed that compared with a single margin, margin distribution is more critical to the generalization performance. Large margin Distribution Machine (LDM) is designed to get superior classification performance and strong generalization performance. However, LDM generally has imbalariced margin distribution between two classes on imbalanced training data. This generally leads to the lower detection rate of the minority class, which contradicts to the needs of high detection rate of the minority class in many real applications. Therefore, cost-sensitive margin distribution learning is brought forward to obtain balanced margin distribution and detection rate between two classes. What's more, this research deduces the relation between cost-sensitive parameter and in-class detection rate, and designs LCSDM to obtain balanced detection rate. Experimental results show that LCSDM can gradually increase the margin distribution of the minority class to obtain a more balanced detection rate. As a general learning method, LCSDM is especially applicable to imbalanced data classification.
引用
收藏
页码:45 / 57
页数:13
相关论文
共 37 条
[1]  
Aiolli F., 2008, P 18 INT C ART NEUR
[2]  
[Anonymous], P 25 INT C MACH LEAR
[3]  
[Anonymous], 2000, P 14 INT C MACH LEAR
[4]  
[Anonymous], INT JOINT C ART INT
[5]  
Batista GE., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI DOI 10.1145/1007730.1007735
[6]  
Bottou L., 2000, P COMPST, P177
[7]   Prediction games and arcing algorithms [J].
Breiman, L .
NEURAL COMPUTATION, 1999, 11 (07) :1493-1517
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]  
Chew H.G., 2000, P INT C CONTR AUT RO
[10]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411