Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition

被引:17
|
作者
Habib, Tanwir [1 ]
Zhang, Chaoyang [2 ]
Yang, Jack Y. [3 ]
Yang, Mary Qu [4 ]
Deng, Youping [1 ]
机构
[1] Univ So Mississippi, Dept Biol Sci, Hattiesburg, MS 39406 USA
[2] Univ So Mississippi, Sch Comp, Hattiesburg, MS 39406 USA
[3] Harvard Univ, Sch Med, Cambridge, MA 02140 USA
[4] NHGRI, NIH, US Dept Hlth & Human Serv, Bethesda, MD 20852 USA
关键词
Support Vector Machine; Kernel Function; Radial Basis Function; Amino Acid Composition; Linear Kernel;
D O I
10.1186/1471-2164-9-S1-S16
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Occurrence of protein in the cell is an important step in understanding its function. It is highly desirable to predict a protein's subcellular locations automatically from its sequence. Most studied methods for prediction of subcellular localization of proteins are signal peptides, the location by sequence homology, and the correlation between the total amino acid compositions of proteins. Taking amino-acid composition and amino acid pair composition into consideration helps improving the prediction accuracy. Results: We constructed a dataset of protein sequences from SWISS-PROT database and segmented them into 12 classes based on their subcellular locations. SVM modules were trained to predict the subcellular location based on amino acid composition and amino acid pair composition. Results were calculated after 10-fold cross validation. Radial Basis Function (RBF) outperformed polynomial and linear kernel functions. Total prediction accuracy reached to 71.8% for amino acid composition and 77.0% for amino acid pair composition. In order to observe the impact of number of subcellular locations we constructed two more datasets of nine and five subcellular locations. Total accuracy was further improved to 79.9% and 85.66%. Conclusions: A new SVM based approach is presented based on amino acid and amino acid pair composition. Result shows that data simulation and taking more protein features into consideration improves the accuracy to a great extent. It was also noticed that the data set needs to be crafted to take account of the distribution of data in all the classes.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition
    Tanwir Habib
    Chaoyang Zhang
    Jack Y Yang
    Mary Qu Yang
    Youping Deng
    BMC Genomics, 9
  • [2] Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition
    Lin, Hao
    Ding, Hui
    Guo, Feng-Biao
    Zhang, An-Ying
    Huang, Jian
    PROTEIN AND PEPTIDE LETTERS, 2008, 15 (07) : 739 - 744
  • [3] Prediction of Subcellular Localization of Apoptosis Protein Using Chou's Pseudo Amino Acid Composition
    Lin, Hao
    Wang, Hao
    Ding, Hui
    Chen, Ying-Li
    Li, Qian-Zhong
    ACTA BIOTHEORETICA, 2009, 57 (03) : 321 - 330
  • [4] Prediction of Subcellular Localization of Apoptosis Protein Using Chou’s Pseudo Amino Acid Composition
    Hao Lin
    Hao Wang
    Hui Ding
    Ying-Li Chen
    Qian-Zhong Li
    Acta Biotheoretica, 2009, 57 : 321 - 330
  • [5] Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition
    Feng, ZP
    BIOPOLYMERS, 2001, 58 (05) : 491 - 499
  • [6] Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition
    Takeyuki Tamura
    Tatsuya Akutsu
    BMC Bioinformatics, 8
  • [7] Identification of proteins by their amino acid composition: An evaluation of the method
    Golaz, O
    Wilkins, MR
    Sanchez, JC
    Appel, RD
    Hochstrasser, DF
    Williams, KL
    ELECTROPHORESIS, 1996, 17 (03) : 573 - 579
  • [8] Prediction of Subcellular Location of Apoptosis Proteins Using Pseudo Amino Acid Composition: An Approach from Auto Covariance Transformation
    Liu, Taigang
    Zheng, Xiaoqi
    Wang, Chunhua
    Wang, Jun
    PROTEIN AND PEPTIDE LETTERS, 2010, 17 (10) : 1263 - 1269
  • [9] Prediction of Rat Protein Subcellular Localization with Pseudo Amino Acid Composition Based on Multiple Sequential Features
    Shi, Ruijia
    Xu, Cunshuan
    PROTEIN AND PEPTIDE LETTERS, 2011, 18 (06) : 625 - 633
  • [10] Prediction of Cyclin Proteins Using Chou's Pseudo Amino Acid Composition
    Mohabatkar, Hassan
    PROTEIN AND PEPTIDE LETTERS, 2010, 17 (10) : 1207 - 1214