Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition

被引:17
|
作者
Habib, Tanwir [1 ]
Zhang, Chaoyang [2 ]
Yang, Jack Y. [3 ]
Yang, Mary Qu [4 ]
Deng, Youping [1 ]
机构
[1] Univ So Mississippi, Dept Biol Sci, Hattiesburg, MS 39406 USA
[2] Univ So Mississippi, Sch Comp, Hattiesburg, MS 39406 USA
[3] Harvard Univ, Sch Med, Cambridge, MA 02140 USA
[4] NHGRI, NIH, US Dept Hlth & Human Serv, Bethesda, MD 20852 USA
关键词
Support Vector Machine; Kernel Function; Radial Basis Function; Amino Acid Composition; Linear Kernel;
D O I
10.1186/1471-2164-9-S1-S16
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Occurrence of protein in the cell is an important step in understanding its function. It is highly desirable to predict a protein's subcellular locations automatically from its sequence. Most studied methods for prediction of subcellular localization of proteins are signal peptides, the location by sequence homology, and the correlation between the total amino acid compositions of proteins. Taking amino-acid composition and amino acid pair composition into consideration helps improving the prediction accuracy. Results: We constructed a dataset of protein sequences from SWISS-PROT database and segmented them into 12 classes based on their subcellular locations. SVM modules were trained to predict the subcellular location based on amino acid composition and amino acid pair composition. Results were calculated after 10-fold cross validation. Radial Basis Function (RBF) outperformed polynomial and linear kernel functions. Total prediction accuracy reached to 71.8% for amino acid composition and 77.0% for amino acid pair composition. In order to observe the impact of number of subcellular locations we constructed two more datasets of nine and five subcellular locations. Total accuracy was further improved to 79.9% and 85.66%. Conclusions: A new SVM based approach is presented based on amino acid and amino acid pair composition. Result shows that data simulation and taking more protein features into consideration improves the accuracy to a great extent. It was also noticed that the data set needs to be crafted to take account of the distribution of data in all the classes.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile
    Ruchi Verma
    Grish C. Varshney
    G. P. S. Raghava
    Amino Acids, 2010, 39 : 101 - 110
  • [22] Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition
    Chen, Ying-Li
    Li, Qian-Zhong
    JOURNAL OF THEORETICAL BIOLOGY, 2007, 248 (02) : 377 - 381
  • [23] Prediction of endoplasmic reticulum resident proteins using fragmented amino acid composition and support vector machine
    Kumar, Ravindra
    Kumari, Bandana
    Kumar, Manish
    PEERJ, 2017, 5
  • [24] Prediction of outer membrane proteins by Support Vector Machines using combinations of gapped amino acid pair compositions
    Huang, SH
    Liu, RS
    Chen, CY
    Chao, YT
    Chen, SY
    BIBE 2005: 5TH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, 2005, : 113 - 120
  • [25] Unique amino acid composition of proteins in halophilic bacteria
    Fukuchi, S
    Yoshimune, K
    Wakayama, M
    Moriguchi, M
    Nishikawa, K
    JOURNAL OF MOLECULAR BIOLOGY, 2003, 327 (02) : 347 - 357
  • [26] PECM: Prediction of extracellular matrix proteins using the concept of Chou's pseudo amino acid composition
    Zhang, Jian
    Sun, Pingping
    Zhao, Xiaowei
    Ma, Zhiqiang
    JOURNAL OF THEORETICAL BIOLOGY, 2014, 363 : 412 - 418
  • [27] THE AMINO-ACID-COMPOSITION OF 350 LYMPHOCYTE PROTEINS
    FREY, JR
    KUHN, L
    KETTMAN, JR
    LEFKOVITS, I
    MOLECULAR IMMUNOLOGY, 1994, 31 (16) : 1219 - 1231
  • [28] Discrimination of psychrophilic enzymes using machine learning algorithms with amino acid composition descriptor
    Huang, Ailan
    Lu, Fuping
    Liu, Fufeng
    FRONTIERS IN MICROBIOLOGY, 2023, 14
  • [29] Protein subcellular location prediction based on pseudo amino acid composition and PSI-blast profile
    Xu, Huimin
    Yan, Shoujiang
    Dai, Qi
    He, Ping-An
    Liao, Bo
    Yao, Yu-Hua
    Journal of Computational and Theoretical Nanoscience, 2015, 12 (10) : 3756 - 3762
  • [30] Armadillo: Domain boundary prediction by amino acid composition
    Dumontier, M
    Yao, R
    Feldman, HJ
    Hogue, CWV
    JOURNAL OF MOLECULAR BIOLOGY, 2005, 350 (05) : 1061 - 1073