Knowledge-based selection of targets for structural genomics

被引:13
作者
Frishman, D [1 ]
机构
[1] GSF, Natl Res Ctr Environm & Hlth, Inst Bioinformat, D-85764 Neuherberg, Germany
来源
PROTEIN ENGINEERING | 2002年 / 15卷 / 03期
关键词
fold recognition; genome analysis; sequence clustering; structural genomics;
D O I
10.1093/protein/15.3.169
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The problem of rational target selection for protein structure determination in structural genomics projects on microbes is addressed. A flexible computational procedure is described that directly incorporates the whole body of annotation available in the PEDANT genome database into the sequence clustering and selection process in order to identify proteins that are likely to possess currently unknown structural domains. Filtering out gene products based on predicted structural features, such as known three-dimensional structures and transmembrane regions, allows one to reduce the complexity of neighbor relationships between sequences and all but eliminates the need for further partitioning of single-linkage clusters into disjoint protein groups corresponding to homologous families. The results of a large-scale computation experiment in which exemplary target selection for 32 prokaryotic genomes was conducted are presented.
引用
收藏
页码:169 / 183
页数:15
相关论文
共 49 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
  • [3] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [4] Cytoplasmic signalling domains: the next generation
    Bork, P
    Schultz, J
    Ponting, CP
    [J]. TRENDS IN BIOCHEMICAL SCIENCES, 1997, 22 (08) : 296 - 298
  • [5] The ASTRAL compendium for protein structure and sequence analysis
    Brenner, SE
    Koehl, P
    Levitt, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 254 - 256
  • [6] The solution structure of the S1 RNA binding domain: A member of an ancient nucleic acid-binding fold
    Bycroft, M
    Hubbard, TJP
    Proctor, M
    Freund, SMV
    Murzin, AG
    [J]. CELL, 1997, 88 (02) : 235 - 242
  • [7] Christendat D, 2000, NAT STRUCT BIOL, V7, P903
  • [8] Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence
    Cole, ST
    Brosch, R
    Parkhill, J
    Garnier, T
    Churcher, C
    Harris, D
    Gordon, SV
    Eiglmeier, K
    Gas, S
    Barry, CE
    Tekaia, F
    Badcock, K
    Basham, D
    Brown, D
    Chillingworth, T
    Connor, R
    Davies, R
    Devlin, K
    Feltwell, T
    Gentles, S
    Hamlin, N
    Holroyd, S
    Hornby, T
    Jagels, K
    Krogh, A
    McLean, J
    Moule, S
    Murphy, L
    Oliver, K
    Osborne, J
    Quail, MA
    Rajandream, MA
    Rogers, J
    Rutter, S
    Seeger, K
    Skelton, J
    Squares, R
    Squares, S
    Sulston, JE
    Taylor, K
    Whitehead, S
    Barrell, BG
    [J]. NATURE, 1998, 393 (6685) : 537 - +
  • [9] Profile hidden Markov models
    Eddy, SR
    [J]. BIOINFORMATICS, 1998, 14 (09) : 755 - 763
  • [10] GeneRAGE: a robust algorithm for sequence clustering and domain detection
    Enright, AJ
    Ouzounis, CA
    [J]. BIOINFORMATICS, 2000, 16 (05) : 451 - 457