A bi-objective hybrid optimization algorithm to reduce noise and data dimension in diabetes diagnosis using support vector machines

被引:31
作者
Alirezaei, Mahsa [1 ]
Niaki, Seyed Taghi Akhavan [1 ]
Niaki, Seyed Armin Akhavan [2 ,3 ]
机构
[1] Sharif Univ Technol, Dept Ind Engn, POB 11155-9414,Azadi Ave, Tehran 1458889694, Iran
[2] West Virginia Univ, Dept Stat, Morgantown, WV USA
[3] Natl Energy Partners, Proc & Operat Analyt Engn Dept, Voorhees Township, NJ USA
关键词
Diabetes diagnosis; Feature selection; Meta-heuristic algorithms; K-means algorithms; Support vector machine; FEATURE-SELECTION; CLASSIFICATION;
D O I
10.1016/j.eswa.2019.02.037
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diabetes mellitus is a medical condition examined by data miners for reasons such as significant health complications in affected people, the economic impact on healthcare networks, and so on. In order to find the main causes of this disease, researchers look into the patient's lifestyle, hereditary information, etc. The goal of data mining in this context is to find patterns that make early detection of the disease and proper treatment easier. Due to the high volume of data involved in therapeutic contexts and disease diagnosis, provision of the intended treatment method become almost impossible over a short period of time. This justifies the use of pre-processing techniques and data reduction methods in such contexts. In this regard, clustering and meta-heuristic algorithms maintain important roles. In this paper, a method based on the k-means clustering algorithm is first utilized to detect and delete outliers. Then, in order to select significant and effective features, four bi-objective meta-heuristic algorithms are employed to choose the least number of significant features with the highest classification accuracy using support vector machines (SVM). In addition, the 10-fold cross validation (CV) method is used to validate the constructed model. Using real case data, it is concluded that the multi-objective firefly (MOFA) and multi-objective imperialist competitive algorithm (MOICA) with a 100% classification accuracy outperform the non-dominated sorting genetic algorithm (NSGA-II) and multi-objective particle swarm optimization (MOPSO) with the accuracies of 98.2% and 94.6%, respectively. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:47 / 57
页数:11
相关论文
共 52 条
[1]   DIAGNOSIS OF DIABETES USING SUPPORT VECTOR MACHINES WITH RADIAL BASIS FUNCTION KERNELS [J].
Abdillah, Abdul Azis ;
Suwarno .
INTERNATIONAL JOURNAL OF TECHNOLOGY, 2016, 7 (05) :849-858
[2]   Intelligent Medical Disease Diagnosis Using Improved Hybrid Genetic Algorithm - Multilayer Perceptron Network [J].
Ahmad, Fadzil ;
Isa, Nor Ashidi Mat ;
Hussain, Zakaria ;
Osman, Muhammad Khusairi .
JOURNAL OF MEDICAL SYSTEMS, 2013, 37 (02)
[3]   A multi objective optimization approach for flexible job shop scheduling problem under random machine breakdown by evolutionary algorithms [J].
Ahmadi, Ehsan ;
Zandieh, Mostafa ;
Farrokh, Mojtaba ;
Emami, Seyed Mohammad .
COMPUTERS & OPERATIONS RESEARCH, 2016, 73 :56-66
[4]   A new hybrid approach for feature selection and support vector machine model selection based on self-adaptive cohort intelligence [J].
Aladeemy, Mohammed ;
Tutun, Salih ;
Khasawneh, Mohammad T. .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 88 :118-131
[5]  
Alamaireh M. F., 2006, 2 INT C INF COMM TEC, V1
[6]  
[Anonymous], 2014, INT J ADV RES COMPUT
[7]  
[Anonymous], DATA WAREHOUSING MIN
[8]  
Atashpaz-Gargari E., 2007, EV COMP CEC 2007 IEE
[9]  
Babu K., 2017, International Journal of Computational Intelligence Research, V13, P2379
[10]  
Banati H., 2011, IJCSI International Journal of Computer Science Issues, V8, P473