Hybrid Global Sensitivity Analysis Based Optimal Attribute Selection Using Classification Techniques by Machine Learning Algorithm

被引:0
作者
G. Saranya
A. Pravin
机构
[1] Sathyabama Institute of Science and Technology,Department of Computer Science and Engineering, School of Computing
[2] SRM Institute of Science and Technology,Department of Networking and Communications, School of Computing
关键词
Classification; Genetic algorithm; Particle swarm; Global sensitivity analysis; Random forest; Filter selection;
D O I
暂无
中图分类号
学科分类号
摘要
Feature selection is a major process in data mining and classification process. It improves the classifier performance and reduces the computation time by removing the redundant and irrelevant information from the dataset. Initially, all variables from 10 to 100 were processed for classification process. It consumes more time and the efficiency of classifier is minimum. The best features can be selected from the following three methods: wrapper selection, filter and embedded method. In wrapper method, the feature selection method is based on two methods namely sequential searching method and heuristic approach. In sequential searching method, the subset of features is determined by processing from empty set. In heuristic approach, the feature is determined by subset of features by achieving the objective function. Some of the heuristic approaches are optimization algorithms like genetic algorithm, particle swarm and so on. Only few works were suggested the random forest classifier due to the hierarchical arrangements of data. Among the three filter techniques, the wrapper-based selection technique able to produce high accuracy. It is due to the attribute selection method. This problem is overcome with the proposed global sensitivity analysis approach. In this, an optimized filter technique is proposed for the feature selection for classification process. Here, the selection of attributes for the classification is performed in two stages. In the first stage, the filter selection is approach is based on the global sensitivity analysis. In the second stage, the dominant attribute from the first stage is determined though the wrapper approach using particle swarm optimization. Due to this multistage feature selection, the proposed approach can be applied to any type of machine learningapplication. The proposed particle swarm optimization based global sensitivity analysis (PSO-GSA) is performed on the Cleveland dataset using MATLAB. Its performance is evaluated in terms of accuracy, sensitivity and specificity and it is compared with the wrapper selection method. The proposed PSO-GSA able to outperform the wrapper selection by high accuracy of 90% and sensitivity of 94.74%. The computational time for the proposed GSA based classification of heart disease using random forest classifier is 0.7689 s, which is less when it iscompared with the computational time of classifiers with bagging and boosting technique.
引用
收藏
页码:2305 / 2324
页数:19
相关论文
共 93 条
[1]  
Chandrashekar G(2014)A survey on feature selection methods Computers and Electrical Engineering 40 16-28
[2]  
Sahin F(2017)Feature selection: A data perspective ACM Computing Surveys (CSUR) 50 1-45
[3]  
Li J(2016)A survey on feature selection Procedia Computer Science 91 919-926
[4]  
Cheng K(2018)Relief-based feature selection: Introduction and review Journal of Biomedical Informatics 85 189-203
[5]  
Wang S(2019)A review of feature selection methods in medical applications Computers in Biology and Medicine 112 141-158
[6]  
Morstatter F(2017)A survey on semi-supervised feature selection methods Pattern Recognition 64 907-948
[7]  
Trevino RP(2020)A review of unsupervised feature selection methods Artificial Intelligence Review 53 423-434
[8]  
Tang J(2018)Feature selection considering two types of feature relevancy and feature interdependency Expert Systems with Applications 93 524-535
[9]  
Liu H(2020)Detecting the phishing attack using collaborative approach and secure login through dynamic virtual passwords Webology 17 212-223
[10]  
Miao J(2017)An analytical method for diseases prediction using machine learning techniques Computers and Chemical Engineering 106 2020-3538