Prediction of Protein Submitochondrial Locations by Incorporating Dipeptide Composition into Chou's General Pseudo Amino Acid Composition

被引：68

作者：

Ahmad, Khurshid ^{[1
]}

Waris, Muhammad ^{[1
]}

Hayat, Maqsood ^{[1
]}

机构：

[1] Abdul Wali Khan Univ Mardan, Dept Comp Sci, Mardan, Pakistan

来源：

JOURNAL OF MEMBRANE BIOLOGY | 2016年 / 249卷 / 03期

关键词：

Mitochondria; Dipeptide composition; SAAC; SVM; SMOTE; PCA; TRANSLATION INITIATION SITE; SUBCELLULAR-LOCALIZATION; PHYSICOCHEMICAL FEATURES; WEB SERVER; IDENTIFICATION; PSEAAC; MODES; BIOINFORMATICS; DISCRIMINATION; ATTRIBUTES;

D O I：

10.1007/s00232-015-9868-8

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Mitochondrion is the key organelle of eukaryotic cell, which provides energy for cellular activities. Submitochondrial locations of proteins play crucial role in understanding different biological processes such as energy metabolism, program cell death, and ionic homeostasis. Prediction of submitochondrial locations through conventional methods are expensive and time consuming because of the large number of protein sequences generated in the last few decades. Therefore, it is intensively desired to establish an automated model for identification of submitochondrial locations of proteins. In this regard, the current study is initiated to develop a fast, reliable, and accurate computational model. Various feature extraction methods such as dipeptide composition (DPC), Split Amino Acid Composition, and Composition and Translation were utilized. In order to overcome the issue of biasness, oversampling technique SMOTE was applied to balance the datasets. Several classification learners including K-Nearest Neighbor, Probabilistic Neural Network, and support vector machine (SVM) are used. Jackknife test is applied to assess the performance of classification algorithms using two benchmark datasets. Among various classification algorithms, SVM achieved the highest success rates in conjunction with the condensed feature space of DPC, which are 95.20 % accuracy on dataset SML3-317 and 95.11 % on dataset SML3-983. The empirical results revealed that our proposed model obtained the highest results so far in the literatures. It is anticipated that our proposed model might be useful for future studies.

引用

页码：293 / 304

页数：12

共 70 条

[1] Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC [J].