Prediction of Essential Proteins in Prokaryotes by Incorporating Various Physico-chemical Features into the General form of Chou's Pseudo Amino Acid Composition

被引:26
|
作者
Sarangi, Aditya Narayan [1 ]
Lohani, Mohtashim [2 ]
Aggarwal, Rakesh [1 ]
机构
[1] Sanjay Gandhi Postgrad Inst Med Sci, Sch Telemed & Biomed Informat, Biomed Informat Ctr, Lucknow 226014, Uttar Pradesh, India
[2] Integral Univ, Dept Biotechnol, Lucknow 226026, Uttar Pradesh, India
关键词
Machine learning; support vector machine; essential protein; classification; SUPPORT VECTOR MACHINES; OUTER-MEMBRANE PROTEINS; POTENTIAL-DRUG TARGETS; SUBCELLULAR-LOCALIZATION; CRYSTALLIZATION PROPENSITY; NETWORK TOPOLOGY; WEB SERVER; SEQUENCE; PSEAAC; IDENTIFICATION;
D O I
10.2174/0929866511320070008
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Prediction of essential proteins of a pathogenic organism is the key for the potential drug target identification, because inhibition of these would be fatal for the pathogen. Identification of these proteins requires the use of complex experimental techniques which are quite expensive and time consuming. We implemented Support Vector Machine algorithm to develop a classifier model for in silico prediction of prokaryotic essential proteins based on the physico-chemical properties of the amino acid sequences. This classifier was designed based on a set of 10 physico-chemical descriptor vectors (DVs) and 4 hybrid DVs calculated from amino acid sequences using PROFEAT and PseAAC servers. The classifier was trained using data sets consisting of 500 known essential and 500 non-essential proteins (n=1,000) and evaluated using an external validation set consisting of 3,462 essential proteins and 5,538 non-essential proteins (n=9,000). The performances of individual DV sets were evaluated. DV set 13, which is the combination of composition, transition and distribution descriptor set and hybrid autocorrelation descriptor set, provided accuracy of 91.2% in 10-fold cross-validation of the training set and an accuracy of 89.7% in external validation set and of 91.8% and 88.1% using a different yeast protein dataset. Our result indicates that this classification model can be used for identification of novel prokaryotic essential proteins.
引用
收藏
页码:781 / 795
页数:15
相关论文
共 50 条
  • [21] PECM: Prediction of extracellular matrix proteins using the concept of Chou's pseudo amino acid composition
    Zhang, Jian
    Sun, Pingping
    Zhao, Xiaowei
    Ma, Zhiqiang
    JOURNAL OF THEORETICAL BIOLOGY, 2014, 363 : 412 - 418
  • [22] Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou's pseudo-amino acid composition
    Qiu, Wenying
    Li, Shan
    Cui, Xiaowen
    Yu, Zhaomin
    Wang, Minghui
    Du, Junwei
    Peng, Yanjun
    Yu, Bin
    JOURNAL OF THEORETICAL BIOLOGY, 2018, 450 : 86 - 103
  • [23] GOASVM: A subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou's pseudo-amino acid composition
    Wan, Shibiao
    Mak, Man-Wai
    Kung, Sun-Yuan
    JOURNAL OF THEORETICAL BIOLOGY, 2013, 323 : 40 - 48
  • [24] Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine
    Mohabatkar, Hassan
    Beigi, Majid Mohammad
    Esmaeili, Abolghasem
    JOURNAL OF THEORETICAL BIOLOGY, 2011, 281 (01) : 18 - 23
  • [25] Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou's general PseAAC
    Meher, Prabina Kumar
    Sahu, Tanmaya Kumar
    Saini, Varsha
    Rao, Atmakuri Ramakrishna
    SCIENTIFIC REPORTS, 2017, 7
  • [26] Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition
    Zhu, Pan-Pan
    Li, Wen-Chao
    Zhong, Zhe-Jin
    Deng, En-Ze
    Ding, Hui
    Chen, Wei
    Lin, Hao
    MOLECULAR BIOSYSTEMS, 2015, 11 (02) : 558 - 563
  • [27] Using radial basis function on the general form of Chou's pseudo amino acid composition and PSSM to predict subcellular locations of proteins with both single and multiple sites
    Huang, Chao
    Yuan, Jingqi
    BIOSYSTEMS, 2013, 113 (01) : 50 - 57
  • [28] PSNO: Predicting Cysteine S-Nitrosylation Sites by Incorporating Various Sequence-Derived Features into the General Form of Chou's PseAAC
    Zhang, Jian
    Zhao, Xiaowei
    Sun, Pingping
    Ma, Zhiqiang
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2014, 15 (07) : 11204 - 11219
  • [29] Predicting Protein Solubility by the General Form of Chou's Pseudo Amino Acid Composition: Approached from Chaos Game Representation and Fractal Dimension
    Niu, Xiao-Hui
    Hu, Xue-Hai
    Shi, Feng
    Xia, Jing-Bo
    PROTEIN AND PEPTIDE LETTERS, 2012, 19 (09) : 940 - 948
  • [30] Dual-Layer Wavelet SVM for Predicting Protein Structural Class Via the General Form of Chou's Pseudo Amino Acid Composition
    Chen, Chao
    Shen, Zhi-Bin
    Zou, Xiao-Yong
    PROTEIN AND PEPTIDE LETTERS, 2012, 19 (04) : 422 - 429