MSLVP: prediction of multiple subcellular localization of viral proteins using a support vector machine

被引:18
作者
Thakur, Anamika [1 ]
Rajput, Akanksha [1 ]
Kumar, Manoj [1 ]
机构
[1] CSIR, Inst Microbial Technol, Bioinformat Ctr, Sect 39-A, Chandigarh 160036, India
关键词
GENERAL-FORM; LOCATION PREDICTION; FUSION CLASSIFIER; LABEL PREDICTOR; VIRUS PROTEINS; WEB SERVER; MPLOC; REGRESSION; SINGLE; HOST;
D O I
10.1039/c6mb00241b
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Knowledge of the subcellular location (SCL) of viral proteins in the host cell is important for understanding their function in depth. Therefore, we have developed "MSLVP'', a two-tier prediction algorithm for predicting multiple SCLs of viral proteins. For this study, data sets of comprehensive viral proteins with experimentally validated SCL annotation were collected from UniProt. Non-redundant (90%) data sets of 3480 viral proteins that belonged to single (2715), double (391) and multiple (374) sites were employed. Additionally, 1687 (30% sequence identity) viral proteins were categorised into single (1366), double (167) and multiple (154) sites. Single, double and multiple locations further comprised of eight, four and six categories, respectively. Viral protein locations include the nucleus, cytoplasm, endoplasmic reticulum, extracellular, single-pass membrane, multi-pass membrane, capsid, remaining others and combinations thereof. Support vector machine based models were developed using sequence features like amino acid composition, dipeptide composition, physicochemical properties and their hybrids. We have employed "one-versus-one'' as well as "one-versus-other'' strategies for multiclass classification. The performance of "one-versus-one'' is better than the "one-versus-other'' approach during 10-fold cross-validation. For the 90% data set, we achieved an accuracy, a Matthew's correlation coefficient (MCC) and a receiver operating characteristic (ROC) of 99.99%, 1.00, 1.00; 100.00%, 1.00, 1.00 and 99.90%; 1.00, 1.00 for single, double and multiple locations, respectively. Similar results were achieved for a 30% sequence identity data set. Predictive models for each SCL performed equally well on the independent dataset. The MSLVP web server (http://bioinfo.imtech.res.in/manojk/mslvpred/) can predict subcellular locations i.e. single (8; including single and multi-pass membrane), double (4) and multiple (6). This would be helpful for elucidating the functional annotation of viral proteins and potential drug targets.
引用
收藏
页码:2572 / 2586
页数:15
相关论文
共 42 条
[1]   The vaccinia virus superoxide dismutase-like protein (A45R) is a virion component that is nonessential for virus replication [J].
Almazán, F ;
Tscharke, DC ;
Smith, GL .
JOURNAL OF VIROLOGY, 2001, 75 (15) :7018-7029
[2]   PSLpred: prediction of subcellular localization of bacterial proteins [J].
Bhasin, M ;
Garg, A ;
Raghava, GPS .
BIOINFORMATICS, 2005, 21 (10) :2522-2524
[3]   ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST [J].
Bhasin, M ;
Raghava, GPS .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W414-W419
[4]   Viral proteins targeting mitochondria: controlling cell death [J].
Boya, P ;
Pauleau, AL ;
Poncet, D ;
Gonzalez-Polo, RA ;
Zamzami, N ;
Kroemer, G .
BIOCHIMICA ET BIOPHYSICA ACTA-BIOENERGETICS, 2004, 1659 (2-3) :178-189
[5]   The transmembrane domain of the respiratory syncytial virus F protein is an orientation-independent apical plasma membrane sorting sequence [J].
Brock, SC ;
Heck, JM ;
McGraw, PA ;
Crowe, JE .
JOURNAL OF VIROLOGY, 2005, 79 (19) :12528-12535
[6]   Esub8: A novel tool to predict protein subcellular localizations in eukaryotic organisms [J].
Cui, QH ;
Jiang, TZ ;
Liu, B ;
Ma, SD .
BMC BIOINFORMATICS, 2004, 5 (1)
[7]   Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search [J].
Garg, A ;
Bhasin, M ;
Raghava, GPS .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2005, 280 (15) :14427-14432
[8]   In silico approaches for designing highly effective cell penetrating peptides [J].
Gautam, Ankur ;
Chaudhary, Kumardeep ;
Kumar, Rahul ;
Sharma, Arun ;
Kapoor, Pallavi ;
Tyagi, Atul ;
Raghava, Gajendra P. S. .
JOURNAL OF TRANSLATIONAL MEDICINE, 2013, 11
[9]   Expression and localization of the Epstein-Barr virus-encoded protein kinase [J].
Gershburg, E ;
Marschall, M ;
Hong, K ;
Pagano, JS .
JOURNAL OF VIROLOGY, 2004, 78 (22) :12140-12146
[10]   pTARGET: a web server for predicting protein subcellular localization [J].
Guda, Chittibabu .
NUCLEIC ACIDS RESEARCH, 2006, 34 :W210-W213