Prediction of Protein Submitochondrial Locations by Incorporating Dipeptide Composition into Chou's General Pseudo Amino Acid Composition

被引:68
作者
Ahmad, Khurshid [1 ]
Waris, Muhammad [1 ]
Hayat, Maqsood [1 ]
机构
[1] Abdul Wali Khan Univ Mardan, Dept Comp Sci, Mardan, Pakistan
关键词
Mitochondria; Dipeptide composition; SAAC; SVM; SMOTE; PCA; TRANSLATION INITIATION SITE; SUBCELLULAR-LOCALIZATION; PHYSICOCHEMICAL FEATURES; WEB SERVER; IDENTIFICATION; PSEAAC; MODES; BIOINFORMATICS; DISCRIMINATION; ATTRIBUTES;
D O I
10.1007/s00232-015-9868-8
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Mitochondrion is the key organelle of eukaryotic cell, which provides energy for cellular activities. Submitochondrial locations of proteins play crucial role in understanding different biological processes such as energy metabolism, program cell death, and ionic homeostasis. Prediction of submitochondrial locations through conventional methods are expensive and time consuming because of the large number of protein sequences generated in the last few decades. Therefore, it is intensively desired to establish an automated model for identification of submitochondrial locations of proteins. In this regard, the current study is initiated to develop a fast, reliable, and accurate computational model. Various feature extraction methods such as dipeptide composition (DPC), Split Amino Acid Composition, and Composition and Translation were utilized. In order to overcome the issue of biasness, oversampling technique SMOTE was applied to balance the datasets. Several classification learners including K-Nearest Neighbor, Probabilistic Neural Network, and support vector machine (SVM) are used. Jackknife test is applied to assess the performance of classification algorithms using two benchmark datasets. Among various classification algorithms, SVM achieved the highest success rates in conjunction with the condensed feature space of DPC, which are 95.20 % accuracy on dataset SML3-317 and 95.11 % on dataset SML3-983. The empirical results revealed that our proposed model obtained the highest results so far in the literatures. It is anticipated that our proposed model might be useful for future studies.
引用
收藏
页码:293 / 304
页数:12
相关论文
共 70 条
[1]   Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC [J].
Ahmad, Saeed ;
Kabir, Muhammad ;
Hayat, Maqsood .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2015, 122 (02) :165-174
[2]   IDM-PhyChm-Ens: Intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids [J].
Ali, Safdar ;
Majid, Abdul ;
Khan, Asifullah .
AMINO ACIDS, 2014, 46 (04) :977-993
[3]  
[Anonymous], 2001, PE HART DG STORK PAT
[4]  
[Anonymous], BIOINFORMATICS
[5]  
[Anonymous], 2004, EDUEL
[6]  
[Anonymous], 2000, NATURE STAT LEARNING, DOI DOI 10.1007/978-1-4757-3264-1
[7]  
[Anonymous], ANGEW CHEM
[8]  
[Anonymous], NUCL ACIDS RES
[9]  
[Anonymous], MOL GENET GENOMICS
[10]   Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data [J].
Bartenhagen, Christoph ;
Klein, Hans-Ulrich ;
Ruckert, Christian ;
Jiang, Xiaoyi ;
Dugas, Martin .
BMC BIOINFORMATICS, 2010, 11