Prediction of endoplasmic reticulum resident proteins using fragmented amino acid composition and support vector machine

被引:27
作者
Kumar, Ravindra [1 ,2 ]
Kumari, Bandana [1 ]
Kumar, Manish [1 ]
机构
[1] Univ Delhi, Dept Biophys, South Campus, New Delhi, India
[2] Agr Res Org, Newe Yaar Res Ctr, Ramat Yishay, Israel
关键词
Pseudo amino acid composition; Amino acid composition; Split amino acid composition; Compositional difference; Leave-one-out cross-validation; SUBCELLULAR-LOCALIZATION; TRANSMEMBRANE PROTEINS; LOCATION PREDICTION; MEMBRANE-PROTEINS; QUALITY-CONTROL; ROC CURVE; GOLGI; CLASSIFIER; SVM; RETENTION;
D O I
10.7717/peerj.3561
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background. The endoplasmic reticulum plays an important role in many cellular processes, which includes protein synthesis, folding and post-translational processing of newly synthesized proteins. It is also the site for quality control of misfolded proteins and entry point of extracellular proteins to the secretory pathway. Hence at any given point of time, endoplasmic reticulum contains two different cohorts of proteins, (i) proteins involved in endoplasrnic reticulum-specific function, which reside in the lumen of the endoplasmic reticulum, called as endoplasmic reticulumresident proteins and (ii) proteins which are in process of moving to the extracellular space. Thus, endoplasmic reticulum resident proteins must somehow be distinguished from newly synthesized secretory proteins, which pass through the endoplasmic reticulum on their way out of the cell. Approxi.mately only 50% of the proteins used in this study as training data had endoplasmic reticulum retention signal, which shows that these signals are not essentially present in all endoplasmic reticulurn resident proteins. This also strongly indicates the role of additional factors in retention of endoplasmic reticulum-specific proteins inside the endoplasmic reticnium. Methods. This is a support vector machine based method, where we had used different forms of protein features as inputs for support vector machine to develop the prediction models. During training leave-one-out approach of cross-validation was used. Maximum performance was obtained with a conibination of amino acid compositions of different part of proteins. Results. In this study, we have reported a novel support vector machine based method for predicting endoplasmic reticulum resident proteins, named as ERPred. During training we achieved a maximum accuracy of 81.42% with leave-one-out approach of cross-validation. When evaluated on independent dataset, ERPred did prediction with sensitivity of 72.31% and specificity of 83.69%. We have also annotated six different proteomes to predict the candidate endoplasmic reticulum resident proteins in them. A webserver, ERPred, was developed to make the method available to the scientific community, which can be accessed at http://proteininformatics.org/mkumar/erpred/index.html. Discussion. We found that out of 124 proteins of the training dataset, only 66 proteins had endoplasmic reticulum retention signals, which shows that these signals are not an absolute necessity for endoplasmic reticulurn resident proteins to remain inside the endoplasmic reticulum. This observation also strongly indicates the role of additional factors in retention of proteins inside the endoplasrnic reticulum. Our proposed predictor, ERPred, is a signal independent tool. It is tuned for the prediction of endoplasmic reticulum resident proteins, even if the query protein does not contain specific ER-retention signal.
引用
收藏
页数:24
相关论文
共 92 条
[1]   Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition [J].
Afridi, Tariq Habib ;
Khan, Asifullah ;
Lee, Yeon Soo .
AMINO ACIDS, 2012, 42 (04) :1443-1454
[2]   Adaptation of protein surfaces to subcellular location [J].
Andrade, MA ;
O'Donoghue, SI ;
Rost, B .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 276 (02) :517-525
[3]   Secretory Protein Biogenesis and Traffic in the Early Secretory Pathway [J].
Barlowe, Charles K. ;
Miller, Elizabeth A. .
GENETICS, 2013, 193 (02) :383-410
[4]   Two endoplasmic reticulum (ER) membrane proteins that facilitate ER-to-Golgi transport of glycosylphosphatidylinositol-anchored proteins [J].
Barz, WP ;
Walter, P .
MOLECULAR BIOLOGY OF THE CELL, 1999, 10 (04) :1043-1059
[5]   Classification of nuclear receptors based on amino acid composition and dipeptide composition [J].
Bhasin, M ;
Raghava, GPS .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2004, 279 (22) :23262-23266
[6]  
Bieberich Erhard, 2014, Adv Neurobiol, V9, P47, DOI 10.1007/978-1-4939-1154-7_3
[7]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[8]   NucPred - Predicting nuclear localization of proteins [J].
Brameier, Markus ;
Krings, Andrea ;
MacCallum, Robert M. .
BIOINFORMATICS, 2007, 23 (09) :1159-1160
[9]   Automated protein subfamily identification and classification [J].
Brown, Duncan P. ;
Krishnamurthy, Nandini ;
Sjoelander, Kimmen .
PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (08) :1526-1538
[10]   Scyl1, mutated in a recessive form of spinocerebellar neurodegeneration, regulates COPI-mediated retrograde traffic [J].
Burman, Jonathon L. ;
Bourbonniere, Lyne ;
Philie, Jacynthe ;
Stroh, Thomas ;
Dejgaard, Selma Y. ;
Presley, John F. ;
McPherson, Peter S. .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2008, 283 (33) :22774-22786