MAPS: An integrated system for protein sequence annotation using support vector machine

被引:0
作者
Wang, Jung-Ying [1 ,3 ]
Liu, Cheng-Kang [1 ]
Lee, Hahn-Ming [1 ,2 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
[2] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
[3] Lunghwa Univ Sci & Technol, Dept Multimedia & Game Sci, Tao Yuan 333, Taiwan
关键词
protein annotation; support vector machine; sequence similarity; gene ontology (GO);
D O I
10.1080/02533839.2008.9671432
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
An Integrated environment for biological data is valuable to the function annotation of protein sequences. In this paper, we present a protein sequence annotation system, named MAPS (Multiple Annotation for Protein Sequences), which provides a mechanism to extract multiple annotations from various types of biological data including SwissProt keywords, InterPro signatures and GO terms. Furthermore, MAPS can automatically eliminate the annotation errors generated by a pre-trained SVM classifier. It assigns an annotation to the protein sequence at question by considering not only a single similar protein but also all similar proteins with the annotation. In other words, we take account of the evolutionary information of the protein of interest to reduce the error annotations inferred from weak sequence similarities and from sequence identities in non-functional segments. The experimental results show that the error annotations can be eliminated effectively while keeping high accuracy for different types of annotations.
引用
收藏
页码:781 / 790
页数:10
相关论文
共 19 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] Apweiler R, 1997, ISMB-97 - FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS FOR MOLECULAR BIOLOGY, PROCEEDINGS, P33
  • [3] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [4] The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
    Boeckmann, B
    Bairoch, A
    Apweiler, R
    Blatter, MC
    Estreicher, A
    Gasteiger, E
    Martin, MJ
    Michoud, K
    O'Donovan, C
    Phan, I
    Pilbout, S
    Schneider, M
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 365 - 370
  • [5] CHANG C, 2001, NATURE, V419, P751
  • [6] Training invariant support vector machines
    Decoste, D
    Schölkopf, B
    [J]. MACHINE LEARNING, 2002, 46 (1-3) : 161 - 190
  • [7] Intrinsic errors in genome annotation
    Devos, D
    Valencia, A
    [J]. TRENDS IN GENETICS, 2001, 17 (08) : 429 - 431
  • [8] Saccharomyces genome database:: Underlying principles and organisation
    Dwight, SS
    Balakrishnan, R
    Christie, KR
    Costanzo, MC
    Dolinski, K
    Engel, SR
    Feierbach, B
    Fisk, DG
    Hirschman, J
    Hong, EL
    Issel-Tarver, L
    Nash, RS
    Sethuraman, A
    Starr, B
    Theesfeld, CL
    Andrada, R
    Binkley, G
    Dong, Q
    Lane, C
    Schroeder, M
    Weng, S
    Botstein, D
    Cherry, JM
    [J]. BRIEFINGS IN BIOINFORMATICS, 2004, 5 (01) : 9 - 22
  • [9] The Gene Ontology (GO) database and informatics resource
    Harris, MA
    Clark, J
    Ireland, A
    Lomax, J
    Ashburner, M
    Foulger, R
    Eilbeck, K
    Lewis, S
    Marshall, B
    Mungall, C
    Richter, J
    Rubin, GM
    Blake, JA
    Bult, C
    Dolan, M
    Drabkin, H
    Eppig, JT
    Hill, DP
    Ni, L
    Ringwald, M
    Balakrishnan, R
    Cherry, JM
    Christie, KR
    Costanzo, MC
    Dwight, SS
    Engel, S
    Fisk, DG
    Hirschman, JE
    Hong, EL
    Nash, RS
    Sethuraman, A
    Theesfeld, CL
    Botstein, D
    Dolinski, K
    Feierbach, B
    Berardini, T
    Mundodi, S
    Rhee, SY
    Apweiler, R
    Barrell, D
    Camon, E
    Dimmer, E
    Lee, V
    Chisholm, R
    Gaudet, P
    Kibbe, W
    Kishore, R
    Schwarz, EM
    Sternberg, P
    Gwinn, M
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D258 - D261
  • [10] Bioinformatics in the post-sequence era
    Kanehisa, M
    Bork, P
    [J]. NATURE GENETICS, 2003, 33 (Suppl 3) : 305 - 310