Enhanced Prediction for Piezophilic Protein by Incorporating Reduced Set of Amino Acids Using Fuzzy-Rough Feature Selection Technique Followed by SMOTE

被引:0
作者
Tiwari, Anoop Kumar [1 ]
Shreevastava, Shivam [2 ]
Subbiah, Karthikeyan [1 ]
Som, Tanmoy [2 ]
机构
[1] Inst Sci BHU, Dept Comp Sci, Varanasi, Uttar Pradesh, India
[2] Indian Inst Technol BHU, Dept Math Sci, Varanasi, Uttar Pradesh, India
来源
MATHEMATICS AND COMPUTING (ICMC 2018) | 2018年 / 253卷
关键词
Feature selection; Imbalanced dataset; SMOTE; Fuzzy-rough set; Random forest; SVM; REDUCTION;
D O I
10.1007/978-981-13-2095-8_15
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, the learning performance of different machine learning algorithms is investigated by applying fuzzy-rough feature selection (FRFS) technique on optimally balanced training and testing sets, consisting of the piezophilic and nonpiezophilic proteins. By experimenting using FRFS technique followed by Synthetic Minority Over-sampling Technique (SMOTE) at optimal balancing ratios, we obtain the best results by achieving sensitivity of 79.60%, specificity of 74.50%, average accuracy of 77.10%, AUC of 0.841, and MCC of 0.542 with random forest algorithm. The ranking of input features according to their differentiating ability of piezophilic and nonpiezophilic proteins is presented by using fuzzy-rough attribute evaluator. From the results, it is observed that the performance of classification algorithms can be improved by selecting the reduced optimally balanced training and testing sets. This can be obtained by selecting the relevant and non-redundant features from training sets using FRFS approach followed by suitably modifying the class distribution.
引用
收藏
页码:185 / 196
页数:12
相关论文
共 37 条
[1]  
[Anonymous], 2009, SIGKDD Explorations, DOI DOI 10.1145/1656274.1656278
[2]  
[Anonymous], 1998, Feature Extraction, Construction and Selection: A Data Mining Perspective
[3]  
Baldi P., 2001, Bioinformatics: The Machine Learning Approach
[4]   SMOTE for high-dimensional class-imbalanced data [J].
Blagus, Rok ;
Lusa, Lara .
BMC BIOINFORMATICS, 2013, 14
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Chawla NV, 2010, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, SECOND EDITION, P875, DOI 10.1007/978-0-387-09823-4_45
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]  
Dash M., 1997, Intelligent Data Analysis, V1
[9]  
Dubois D., 1992, INTELLIGENT DECISION, P203, DOI [10.1007/978-94-015-7975-9_14, DOI 10.1007/978-94-015-7975-9_14, 10.1007/978-94-015-7975-9 14, DOI 10.1007/978-94-015-7975-914]
[10]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284