Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition

被引:17
|
作者
Habib, Tanwir [1 ]
Zhang, Chaoyang [2 ]
Yang, Jack Y. [3 ]
Yang, Mary Qu [4 ]
Deng, Youping [1 ]
机构
[1] Univ So Mississippi, Dept Biol Sci, Hattiesburg, MS 39406 USA
[2] Univ So Mississippi, Sch Comp, Hattiesburg, MS 39406 USA
[3] Harvard Univ, Sch Med, Cambridge, MA 02140 USA
[4] NHGRI, NIH, US Dept Hlth & Human Serv, Bethesda, MD 20852 USA
关键词
Support Vector Machine; Kernel Function; Radial Basis Function; Amino Acid Composition; Linear Kernel;
D O I
10.1186/1471-2164-9-S1-S16
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Occurrence of protein in the cell is an important step in understanding its function. It is highly desirable to predict a protein's subcellular locations automatically from its sequence. Most studied methods for prediction of subcellular localization of proteins are signal peptides, the location by sequence homology, and the correlation between the total amino acid compositions of proteins. Taking amino-acid composition and amino acid pair composition into consideration helps improving the prediction accuracy. Results: We constructed a dataset of protein sequences from SWISS-PROT database and segmented them into 12 classes based on their subcellular locations. SVM modules were trained to predict the subcellular location based on amino acid composition and amino acid pair composition. Results were calculated after 10-fold cross validation. Radial Basis Function (RBF) outperformed polynomial and linear kernel functions. Total prediction accuracy reached to 71.8% for amino acid composition and 77.0% for amino acid pair composition. In order to observe the impact of number of subcellular locations we constructed two more datasets of nine and five subcellular locations. Total accuracy was further improved to 79.9% and 85.66%. Conclusions: A new SVM based approach is presented based on amino acid and amino acid pair composition. Result shows that data simulation and taking more protein features into consideration improves the accuracy to a great extent. It was also noticed that the data set needs to be crafted to take account of the distribution of data in all the classes.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
    Jiren Wang
    Wing-Kin Sung
    Arun Krishnan
    Kuo-Bin Li
    BMC Bioinformatics, 6
  • [42] Prediction of pupylation sites using the composition of k-spaced amino acid pairs
    Tung, Chun-Wei
    JOURNAL OF THEORETICAL BIOLOGY, 2013, 336 : 11 - 17
  • [43] Prediction of palmitoylation sites using the composition of k-spaced amino acid pairs
    Wang, Xiao-Bo
    Wu, Ling-Yun
    Wang, Yong-Cui
    Deng, Nai-Yang
    PROTEIN ENGINEERING DESIGN & SELECTION, 2009, 22 (11) : 707 - 712
  • [44] Protein location prediction using atomic composition and global features of the amino acid sequence
    Cherian, Betsy Sheena
    Nair, Achuthsankar S.
    BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2010, 391 (04) : 1670 - 1674
  • [45] Prediction of Presynaptic and Postsynaptic Neurotoxins Using Hybrid Approach and Pseudo Amino Acid Composition
    Yang, Lei
    Li, Qianzhong
    Zuo, Yongchun
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS, VOLS 1-4, 2009, : 1504 - 1507
  • [46] Prediction of pattern recognition receptor family using pseudo-amino acid composition
    Gao, Qing-Bin
    Zhao, Hongyu
    Ye, Xiaofei
    He, Jia
    BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2012, 417 (01) : 73 - 77
  • [47] Amino acid composition analysis of human secondary transport proteins and implications for reliable membrane topology prediction
    Saidijam, Massoud
    Azizpour, Sonia
    Patching, Simon G.
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2017, 35 (05) : 929 - 949
  • [48] Nucleotide bias causes a genomewide bias in the amino acid composition of proteins
    Singer, GAC
    Hickey, DA
    MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (11) : 1581 - 1588
  • [49] Amino acid composition of proteins extracted from endemic goiter glands
    Baggio, MC
    MedeirosNeto, G
    Osawa, Y
    Nguyen, NY
    Santisteban, P
    Knobel, M
    Grollman, EF
    ENDOCRINE PATHOLOGY, 1996, 7 (02) : 137 - 143
  • [50] Prediction of Integral Membrane Protein Type by Collocated Hydrophobic Amino Acid Pair
    Chen, Ke
    Jiang, Yingfu
    Du, Li
    Kurgan, Lukasz
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2009, 30 (01) : 163 - 172