A bi-objective hybrid optimization algorithm to reduce noise and data dimension in diabetes diagnosis using support vector machines

被引:31
作者
Alirezaei, Mahsa [1 ]
Niaki, Seyed Taghi Akhavan [1 ]
Niaki, Seyed Armin Akhavan [2 ,3 ]
机构
[1] Sharif Univ Technol, Dept Ind Engn, POB 11155-9414,Azadi Ave, Tehran 1458889694, Iran
[2] West Virginia Univ, Dept Stat, Morgantown, WV USA
[3] Natl Energy Partners, Proc & Operat Analyt Engn Dept, Voorhees Township, NJ USA
关键词
Diabetes diagnosis; Feature selection; Meta-heuristic algorithms; K-means algorithms; Support vector machine; FEATURE-SELECTION; CLASSIFICATION;
D O I
10.1016/j.eswa.2019.02.037
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diabetes mellitus is a medical condition examined by data miners for reasons such as significant health complications in affected people, the economic impact on healthcare networks, and so on. In order to find the main causes of this disease, researchers look into the patient's lifestyle, hereditary information, etc. The goal of data mining in this context is to find patterns that make early detection of the disease and proper treatment easier. Due to the high volume of data involved in therapeutic contexts and disease diagnosis, provision of the intended treatment method become almost impossible over a short period of time. This justifies the use of pre-processing techniques and data reduction methods in such contexts. In this regard, clustering and meta-heuristic algorithms maintain important roles. In this paper, a method based on the k-means clustering algorithm is first utilized to detect and delete outliers. Then, in order to select significant and effective features, four bi-objective meta-heuristic algorithms are employed to choose the least number of significant features with the highest classification accuracy using support vector machines (SVM). In addition, the 10-fold cross validation (CV) method is used to validate the constructed model. Using real case data, it is concluded that the multi-objective firefly (MOFA) and multi-objective imperialist competitive algorithm (MOICA) with a 100% classification accuracy outperform the non-dominated sorting genetic algorithm (NSGA-II) and multi-objective particle swarm optimization (MOPSO) with the accuracies of 98.2% and 94.6%, respectively. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:47 / 57
页数:11
相关论文
共 52 条
[41]   Using neural networks to predict the onset of diabetes mellitus [J].
Shanker, MS .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (01) :35-41
[42]   Multi-objective feature selection for warfarin dose prediction [J].
Sohrabi, Mohammad Karim ;
Tajik, Alireza .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2017, 69 :126-133
[43]  
Taguchi G, 1986, Introduction to Quality Engineering: Designing Quality into Products and Processes
[44]  
Talbi E. G., 2008, IEEE ACS INT C COMP
[45]  
Tambade S., 2017, INT J COMPUT APPL, V167, P40
[46]   Feature selection based on rough sets and particle swarm optimization [J].
Wang, Xiangyang ;
Yang, Jie ;
Teng, Xiaolong ;
Xia, Weijun ;
Jensen, Richard .
PATTERN RECOGNITION LETTERS, 2007, 28 (04) :459-471
[47]   Davies Bouldin Index based hierarchical initialization K-means [J].
Xiao, Junwei ;
Lu, Jianfeng ;
Li, Xiangyu .
INTELLIGENT DATA ANALYSIS, 2017, 21 (06) :1327-1338
[48]  
Xue B, 2012, P 14 ANN C GEN EV CO
[49]  
Yang X.-S., 2008, Nature-Inspired Metaheuristic Algorithms
[50]   A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases [J].
Yilmaz, Nihat ;
Inan, Onur ;
Uzer, Mustafa Serter .
JOURNAL OF MEDICAL SYSTEMS, 2014, 38 (05)