iAFP-gap-SMOTE: An Efficient Feature Extraction Scheme Gapped Dipeptide Composition is Coupled with an Oversampling Technique for Identification of Antifreeze Proteins

被引:29
作者
Akbar, Shahid [1 ]
Hayat, Maqsood [1 ]
Kabir, Muhammad [1 ]
Iqbal, Muhammad [1 ]
机构
[1] Abdul Wali Khan Univ, Dept Comp Sci, Mardan 23200, KP, Pakistan
关键词
Antifreeze proteins; Smote; KNN; PNN; SVM; AFPs; AMINO-ACID-COMPOSITION; SUBCELLULAR-LOCALIZATION; EVOLUTIONARY INFORMATION; STRUCTURAL CLASS; GENERAL-FORM; PREDICTION; SELECTION; PEPTIDES; ENSEMBLE; MACHINE;
D O I
10.2174/1570178615666180816101653
中图分类号
O62 [有机化学];
学科分类号
070303 ; 081704 ;
摘要
Antifreeze proteins (AFPs) perform distinguishable roles in maintaining homeostatic conditions of living organisms and protect their cell and body from freezing in extremely cold conditions. Owing to high diversity in protein sequences and structures, the discrimination of AFPs from nonAFPs through experimental approaches is expensive and lengthy. It is, therefore, vastly desirable to propose a computational intelligent and high throughput model that truly reflects AFPs quickly and accurately. In a sequel, a new predictor called "iAFP-gap-SMOTE" is proposed for the identification of AFPs. Protein sequences are expressed by adopting three numerical feature extraction schemes namely; Split Amino Acid Composition, G-gap di-peptide Composition and Reduce Amino Acid alphabet composition. Usually, classification hypothesis biased towards majority class in case of the imbalanced dataset. Oversampling technique Synthetic Minority Over-sampling Technique is employed in order to increase the instances of the lower class and control the biasness. 10-fold cross-validation test is applied to appraise the success rates of "iAFP-gap-SMOTE" model. After the empirical investigation, "iAFP-gap-SMOTE" model obtained 95.02% accuracy. The comparison suggested that the accuracy of "iAFP-gap-SMOTE" model is higher than that of the present techniques in the literature so far. It is greatly recommended that our proposed model "iAFP-gap-SMOTE" might be helpful for the research community and academia.
引用
收藏
页码:294 / 302
页数:9
相关论文
共 95 条
[1]   Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition [J].
Afridi, Tariq Habib ;
Khan, Asifullah ;
Lee, Yeon Soo .
AMINO ACIDS, 2012, 42 (04) :1443-1454
[2]   Prediction of Protein Submitochondrial Locations by Incorporating Dipeptide Composition into Chou's General Pseudo Amino Acid Composition [J].
Ahmad, Khurshid ;
Waris, Muhammad ;
Hayat, Maqsood .
JOURNAL OF MEMBRANE BIOLOGY, 2016, 249 (03) :293-304
[3]   Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC [J].
Ahmad, Saeed ;
Kabir, Muhammad ;
Hayat, Maqsood .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2015, 122 (02) :165-174
[4]  
Akbar S., 2015, J APPL ENVIRON BIOL, V5, P28
[5]   iACP-GAEnsC: Evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space [J].
Akbar, Shahid ;
Hayat, Maqsood ;
Iqbal, Muhammad ;
Jan, Mian Ahmad .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2017, 79 :62-70
[6]   Machine learning approaches for discrimination of Extracellular Matrix proteins using hybrid feature space [J].
Ali, Farman ;
Hayat, Maqsood .
JOURNAL OF THEORETICAL BIOLOGY, 2016, 403 :30-37
[7]   Classification of membrane protein types using Voting Feature Interval in combination with Chou's Pseudo Amino Acid Composition [J].
Ali, Farman ;
Hayat, Maqsood .
JOURNAL OF THEORETICAL BIOLOGY, 2015, 384 :78-83
[8]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[9]  
[Anonymous], 2014, IJCSI
[10]  
Buzzini P., 2014, Cold-adapted yeasts. Miscellaneous Cold-Active Yeast Enzymes of Industrial Importance