SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity

被引:149
作者
Li, Ying Hong [1 ,2 ]
Xu, Jing Yu [1 ,2 ,5 ]
Tao, Lin [1 ,2 ,3 ]
Li, Xiao Feng [1 ,2 ]
Li, Shuang [1 ,2 ]
Zeng, Xian [3 ]
Chen, Shang Ying [3 ]
Zhang, Peng [3 ]
Qin, Chu [3 ]
Zhang, Cheng [3 ]
Chen, Zhe [4 ]
Zhu, Feng [1 ,2 ]
Chen, Yu Zong [3 ]
机构
[1] Chongqing Univ, Innovat Drug Res & Bioinformat Grp, Innovat Drug Res Ctr, Chongqing 401331, Peoples R China
[2] Chongqing Univ, Sch Pharmaceut Sci, Chongqing 401331, Peoples R China
[3] Natl Univ Singapore, Dept Pharm, Bioinformat & Drug Discovery Grp, Singapore 117543, Singapore
[4] Zhejiang Chinese Med Univ, Zhejiang Hosp Tradit Chinese Med, Zhejiang Key Lab Gastrointestinal Pathophysiol, Hangzhou, Zhejiang, Peoples R China
[5] Beijing Inst Technol, Sch Math & Stat, Beijing, Peoples R China
来源
PLOS ONE | 2016年 / 11卷 / 08期
关键词
AMINO-ACID-COMPOSITION; SUPPORT VECTOR MACHINE; DNA-BINDING PROTEINS; SECONDARY STRUCTURE; PHYSICOCHEMICAL FEATURES; CLASSIFICATION; IDENTIFICATION; UPDATE; RESOURCE; DATABASE;
D O I
10.1371/journal.pone.0155290
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.
引用
收藏
页数:14
相关论文
共 80 条
  • [1] [Anonymous], 2011, ACM T INTEL SYST TEC, DOI DOI 10.1145/1961189.1961199
  • [2] Bateman A, 2002, NUCLEIC ACIDS RES, V30, P276, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
  • [3] UniProt: a hub for protein information
    Bateman, Alex
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Apweiler, Rolf
    Alpi, Emanuele
    Antunes, Ricardo
    Arganiska, Joanna
    Bely, Benoit
    Bingley, Mark
    Bonilla, Carlos
    Britto, Ramona
    Bursteinas, Borisas
    Chavali, Gayatri
    Cibrian-Uhalte, Elena
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Fazzini, Francesco
    Gane, Paul
    Cas-tro, Leyla Garcia
    Garmiri, Penelope
    Hatton-Ellis, Emma
    Hieta, Reija
    Huntley, Rachael
    Legge, Duncan
    Liu, Wudong
    Luo, Jie
    MacDougall, Alistair
    Mutowo, Prudence
    Nightin-gale, Andrew
    Orchard, Sandra
    Pichler, Klemens
    Poggioli, Diego
    Pundir, Sangya
    Pureza, Luis
    Qi, Guoying
    Rosanoff, Steven
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Turner, Edward
    Volynkin, Vladimir
    Wardell, Tony
    Watkins, Xavier
    Zellner, Hermann
    Cowley, Andrew
    Figueira, Luis
    Li, Weizhong
    McWilliam, Hamish
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D204 - D212
  • [4] Bernardes Juliana S, 2013, Recent Pat Biotechnol, V7, P122
  • [5] Gene Ontology Consortium: going forward
    Blake, J. A.
    Christie, K. R.
    Dolan, M. E.
    Drabkin, H. J.
    Hill, D. P.
    Ni, L.
    Sitnikov, D.
    Burgess, S.
    Buza, T.
    Gresham, C.
    McCarthy, F.
    Pillai, L.
    Wang, H.
    Carbon, S.
    Dietze, H.
    Lewis, S. E.
    Mungall, C. J.
    Munoz-Torres, M. C.
    Feuermann, M.
    Gaudet, P.
    Basu, S.
    Chisholm, R. L.
    Dodson, R. J.
    Fey, P.
    Mi, H.
    Thomas, P. D.
    Muruganujan, A.
    Poudel, S.
    Hu, J. C.
    Aleksander, S. A.
    McIntosh, B. K.
    Renfro, D. P.
    Siegele, D. A.
    Attrill, H.
    Brown, N. H.
    Tweedie, S.
    Lomax, J.
    Osumi-Sutherland, D.
    Parkinson, H.
    Roncaglia, P.
    Lovering, R. C.
    Talmud, P. J.
    Humphries, S. E.
    Denny, P.
    Campbell, N. H.
    Foulger, R. E.
    Chibucos, M. C.
    Giglio, M. Gwinn
    Chang, H. Y.
    Finn, R.
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D1049 - D1056
  • [6] BLAST: a more efficient report with usability improvements
    Boratyn, Grzegorz M.
    Camacho, Christiam
    Cooper, Peter S.
    Coulouris, George
    Fong, Amelia
    Ma, Ning
    Madden, Thomas L.
    Matten, Wayne T.
    McGinnis, Scott D.
    Merezhuk, Yuri
    Raytselis, Yan
    Sayers, Eric W.
    Tao, Tao
    Ye, Jian
    Zaretskaya, Irena
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (W1) : W29 - W33
  • [7] Powers and pitfalls in sequence analysis: The 70% hurdle
    Bork, P
    [J]. GENOME RESEARCH, 2000, 10 (04) : 398 - 400
  • [8] BROTO P, 1984, EUR J MED CHEM, V19, P79
  • [9] Enzyme family classification by support vector machines
    Cai, CZ
    Han, LY
    Ji, ZL
    Chen, YZ
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (01) : 66 - 76
  • [10] SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence
    Cai, CZ
    Han, LY
    Ji, ZL
    Chen, X
    Chen, YZ
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3692 - 3697