Rule Extraction From Support Vector Machines Using Ensemble Learning Approach: An Application for Diagnosis of Diabetes

被引:71
作者
Han, Longfei [1 ]
Luo, Senlin [2 ]
Yu, Jianmin [2 ]
Pan, Limin [2 ]
Chen, Songjing [2 ]
机构
[1] Beijing Inst Technol, Beijing 100081, Peoples R China
[2] Beijing Inst Technol, Informat Syst & Secur & Countermeasures Expt Ctr, Beijing 100081, Peoples R China
关键词
diagnosis of diabetes; ensemble learning; random forest (RF); rule extraction; support vector machines (SVMs); RISK SCORE; CLASSIFICATION; TOOLS;
D O I
10.1109/JBHI.2014.2325615
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes mellitus is a chronic disease and a worldwide public health challenge. It has been shown that 50-80% proportion of T2DM is undiagnosed. In this paper, support vector machines are utilized to screen diabetes, and an ensemble learning module is added, which turns the "black box" of SVM decisions into comprehensible and transparent rules, and it is also useful for solving imbalance problem. Results on China Health and Nutrition Survey data show that the proposed ensemble learning method generates rule sets with weighted average precision 94.2% and weighted average recall 93.9% for all classes. Furthermore, the hybrid system can provide a tool for diagnosis of diabetes, and it supports a second opinion for lay users.
引用
收藏
页码:728 / 734
页数:7
相关论文
共 42 条
[1]  
Akgobek O, 2012, ENER EDUC SCI TECH-A, V29, P1039
[2]  
[Anonymous], 2011, IDF Diabetes Atlas, V5th
[3]   Multilevel examination of diabetes in modernising China: what elements of urbanisation are most associated with diabetes? [J].
Attard, S. M. ;
Herring, A. H. ;
Mayer-Davis, E. J. ;
Popkin, B. M. ;
Meigs, J. B. ;
Gordon-Larsen, P. .
DIABETOLOGIA, 2012, 55 (12) :3182-3192
[4]  
Barakat N, 2005, INT J COMPUTATIONAL, V2, P59
[5]  
Barakat N., 2004, P 14 INT C COMP THEO, P178
[6]  
Barakat NH, 2007, IEEE T KNOWL DATA EN, V19, P729, DOI [10.1109/TKDE.2007.1023, 10.1109/TKDE.2007.1023.]
[7]   Intelligible Support Vector Machines for Diagnosis of Diabetes Mellitus [J].
Barakat, Nahla H. ;
Bradley, Andrew P. ;
Barakat, Mohamed Nabil H. .
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2010, 14 (04) :1114-1120
[8]   Risk scores based on self-reported or available clinical data to detect undiagnosed Type 2 Diabetes: A systematic review [J].
Brown, Nicola ;
Critchley, Julia ;
Bogowicz, Paul ;
Mayige, Mary ;
Unwin, Nigel .
DIABETES RESEARCH AND CLINICAL PRACTICE, 2012, 98 (03) :369-385
[9]   Risk Assessment Tools for Identifying Individuals at Risk of Developing Type 2 Diabetes [J].
Buijsse, Brian ;
Simmons, Rebecca K. ;
Griffin, Simon J. ;
Schulze, Matthias B. .
EPIDEMIOLOGIC REVIEWS, 2011, 33 (01) :46-62
[10]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)