Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis

被引:69
作者
Gan, Dan [1 ]
Shen, Jiang [1 ]
An, Bang [1 ]
Xu, Man [2 ]
Liu, Na [1 ]
机构
[1] Tianjin Univ, Coll Management & Econ, Tianjin 300072, Peoples R China
[2] Nankai Univ, Business Sch, Tianjin 300071, Peoples R China
基金
中国国家自然科学基金;
关键词
TANBN; Cost sensitive; Integrated learning; Classification algorithm; Imbalanced data; Medical diagnosis; RANDOM FOREST ALGORITHM; FEATURE-SELECTION; SAMPLING METHOD; ENSEMBLE;
D O I
10.1016/j.cie.2019.106266
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
For the imbalanced classification problems, most traditional classification models only focus on searching for an excellent classifier to maximize classification accuracy with the fixed misclassification cost, not take into consideration that misclassification cost can change with sample probability distribution. So far as we know, cost-sensitive learning method can be effectively utilized to solve imbalanced data classification problems. In this regards, we propose an integrated TANBN with cost-sensitive classification algorithm (AdaC-TANBN) to overcome the above drawback and improve classification accuracy. The AdaC-TANBN algorithm employs variable misclassification cost determined by samples distribution probability to train classifier, then implements classification for imbalanced data in medical diagnosis. The effectiveness of our proposed approach is examined on the Cleveland heart dataset (Heart), Indian liver patient dataset (ILPD), Dermatology dataset and Cervical cancer risk factors dataset (CCRF) from the UCI learning repository. The experimental results indicate that the AdaC-TANBN algorithm can outperform other state-of-the-art comparative methods.
引用
收藏
页数:9
相关论文
共 64 条
[1]  
Akhoury S. S., 2014, 14 IEEE INT C DAT MI
[2]   Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data [J].
Ali, Safdar ;
Majid, Abdul ;
Javed, Syed Gibran ;
Sattar, Mohsin .
COMPUTERS IN BIOLOGY AND MEDICINE, 2016, 73 :38-46
[3]   A new complement naive Bayesian approach for biomedical data classification [J].
Anagaw, Amare ;
Chang, Yang-Lang .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019, 10 (10) :3889-3897
[4]   An approach for classification of highly imbalanced data using weighting and undersampling [J].
Anand, Ashish ;
Pugalenthi, Ganesan ;
Fogel, Gary B. ;
Suganthan, P. N. .
AMINO ACIDS, 2010, 39 (05) :1385-1391
[5]  
[Anonymous], IEEE S COMP INT DAT
[6]  
Bahnsen A.C., 2013, 12 INT C MACH LEARN
[7]   High dimensional classifiers in the imbalanced case [J].
Bak, Britta Anker ;
Jensen, Jens Ledet .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2016, 98 :46-59
[8]  
Bhatnagar R., 2019, ANN M AM SOC CLIN PH
[9]  
Bhattacharya S., 2017, 31 AAA C ART INT SAN
[10]   SMOTE for high-dimensional class-imbalanced data [J].
Blagus, Rok ;
Lusa, Lara .
BMC BIOINFORMATICS, 2013, 14