Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition

被引:17
|
作者
Habib, Tanwir [1 ]
Zhang, Chaoyang [2 ]
Yang, Jack Y. [3 ]
Yang, Mary Qu [4 ]
Deng, Youping [1 ]
机构
[1] Univ So Mississippi, Dept Biol Sci, Hattiesburg, MS 39406 USA
[2] Univ So Mississippi, Sch Comp, Hattiesburg, MS 39406 USA
[3] Harvard Univ, Sch Med, Cambridge, MA 02140 USA
[4] NHGRI, NIH, US Dept Hlth & Human Serv, Bethesda, MD 20852 USA
关键词
Support Vector Machine; Kernel Function; Radial Basis Function; Amino Acid Composition; Linear Kernel;
D O I
10.1186/1471-2164-9-S1-S16
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Occurrence of protein in the cell is an important step in understanding its function. It is highly desirable to predict a protein's subcellular locations automatically from its sequence. Most studied methods for prediction of subcellular localization of proteins are signal peptides, the location by sequence homology, and the correlation between the total amino acid compositions of proteins. Taking amino-acid composition and amino acid pair composition into consideration helps improving the prediction accuracy. Results: We constructed a dataset of protein sequences from SWISS-PROT database and segmented them into 12 classes based on their subcellular locations. SVM modules were trained to predict the subcellular location based on amino acid composition and amino acid pair composition. Results were calculated after 10-fold cross validation. Radial Basis Function (RBF) outperformed polynomial and linear kernel functions. Total prediction accuracy reached to 71.8% for amino acid composition and 77.0% for amino acid pair composition. In order to observe the impact of number of subcellular locations we constructed two more datasets of nine and five subcellular locations. Total accuracy was further improved to 79.9% and 85.66%. Conclusions: A new SVM based approach is presented based on amino acid and amino acid pair composition. Result shows that data simulation and taking more protein features into consideration improves the accuracy to a great extent. It was also noticed that the data set needs to be crafted to take account of the distribution of data in all the classes.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Identification of Human Enzymes Using Amino Acid Composition and the Composition of k-Spaced Amino Acid Pairs
    Zhang, Lifu
    Dong, Benzhi
    Teng, Zhixia
    Zhang, Ying
    Juan, Liran
    BIOMED RESEARCH INTERNATIONAL, 2020, 2020
  • [32] Amino Acid Composition in Various Types of Nucleic Acid-Binding Proteins
    Bartas, Martin
    Cerven, Jiri
    Guziurova, Simona
    Slychko, Kristyna
    Pecinka, Petr
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2021, 22 (02) : 1 - 12
  • [33] Prediction of the partitioning behaviour of proteins in aqueous two-phase systems using only their amino acid composition
    Salgado, J. Cristian
    Andrews, Barbara A.
    Ortuzar, Maria Fernanda
    Asenjo, Juan A.
    JOURNAL OF CHROMATOGRAPHY A, 2008, 1178 (1-2) : 134 - 144
  • [34] Supersecondary Structure Prediction Using Chou's Pseudo Amino Acid Composition
    Zou, Dongsheng
    He, Zhongshi
    He, Jingyuan
    Xia, Yuxian
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2011, 32 (02) : 271 - 278
  • [35] Relation between amino acid composition and cellular location of proteins
    Cedano, J
    Aloy, P
    PerezPons, JA
    Querol, E
    JOURNAL OF MOLECULAR BIOLOGY, 1997, 266 (03) : 594 - 600
  • [36] Production, Purification, and Study of the Amino Acid Composition of Microalgae Proteins
    Andreeva, Anna
    Budenkova, Ekaterina
    Babich, Olga
    Sukhikh, Stanislav
    Ulrikh, Elena
    Ivanova, Svetlana
    Prosekov, Alexander
    Dolganyuk, Vyacheslav
    MOLECULES, 2021, 26 (09):
  • [37] Prediction of protein structural class by amino acid and polypeptide composition
    Luo, RY
    Feng, ZP
    Liu, JK
    EUROPEAN JOURNAL OF BIOCHEMISTRY, 2002, 269 (17): : 4219 - 4225
  • [38] Prediction of nuclear receptors with optimal pseudo amino acid composition
    Gao, Qing-Bin
    Jin, Zhi-Chao
    Ye, Xiao-Fei
    Wu, Cheng
    He, Jia
    ANALYTICAL BIOCHEMISTRY, 2009, 387 (01) : 54 - 59
  • [39] The Prediction of Succinylation Site in Protein by Analyzing Amino Acid Composition
    Van-Minh Bui
    Van-Nui Nguyen
    ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY, 2017, 538 : 633 - 642
  • [40] Predicting protein subnuclear localization using GO-amino-acid composition features
    Huang, Wen-Lin
    Tung, Chun-Wei
    Huang, Hui-Ling
    Ho, Shinn-Ying
    BIOSYSTEMS, 2009, 98 (02) : 73 - 79