Exploiting homogeneity in protein sequence clusters for construction of protein family hierarchies

被引:5
作者
Chen, Chien-Yu
Chung, Wen-Chin
Su, Chung-Tsai
机构
[1] Natl Taiwan Univ, Dept Bioind Mechatron Engn, Taipei 106, Taiwan
[2] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
关键词
protein sequence clustering; family analysis; twilight zone; hierarchical algorithm;
D O I
10.1016/j.patcog.2005.12.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of proteomics, protein hierarchies based on sequence analysis have been extensively applied to automate the annotations of new proteins and facilitate the discovery and analysis of protein families. However, the presence of ambiguous similarities in large databases increases the difficulty of delivering protein family hierarchies with favorable sensitivity and specificity. This work develops the HomoClust algorithm that exploits the homogeneity of protein sequences in generating protein family hierarchies. HomoClust improves the clustering quality of traditional hierarchical clustering algorithms by adopting different clustering mechanisms for different levels of sequence similarity. With considering homogeneity detection during clustering process, HomoClust increases the sensitivity of protein clusters without a drop in high specificity. (c) 2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:2356 / 2369
页数:14
相关论文
共 47 条
[1]   Clustering of proximal sequence space for the identification of protein families [J].
Abascal, F ;
Valencia, A .
BIOINFORMATICS, 2002, 18 (07) :908-921
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]  
[Anonymous], 1991, APPL MULTIVARIATE DA
[5]   InterPro - an integrated documentation resource for protein families, domains and functional sites [J].
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Bateman, A ;
Birney, E ;
Biswas, M ;
Bucher, P ;
Cerutti, L ;
Corpet, F ;
Croning, MDR ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Gouzy, J ;
Hermjakob, H ;
Hulo, N ;
Jonassen, I ;
Kahn, D ;
Kanapin, A ;
Karavidopoulou, Y ;
Lopez, R ;
Marx, B ;
Mulder, NJ ;
Oinn, TM ;
Pagni, M ;
Servant, F ;
Sigrist, CJA ;
Zdobnov, EM .
BIOINFORMATICS, 2000, 16 (12) :1145-1150
[6]   Proteome Analysis Database: online application of InterPro and CluSTr for the functional classification of proteins in whole genomes [J].
Apweiler, R ;
Biswas, W ;
Fleischmann, W ;
Kanapin, A ;
Karavidopoulou, Y ;
Kersey, P ;
Kriventseva, EV ;
Mittard, V ;
Mulder, N ;
Phan, I ;
Zdobnov, E .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :44-48
[7]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[8]   Clustering protein sequences-structure prediction by transitive homology [J].
Bolten, E ;
Schliep, A ;
Schneckener, S ;
Schomburg, D ;
Schrader, R .
BIOINFORMATICS, 2001, 17 (10) :935-941
[9]   Incremental generation of summarized clustering hierarchy for protein family analysis [J].
Chen, CY ;
Oyang, YJ ;
Juan, HF .
BIOINFORMATICS, 2004, 20 (16) :2586-2596
[10]   Significance of Z-value statistics of Smith-Waterman scores for protein alignments [J].
Comet, JP ;
Aude, JC ;
Glémet, E ;
Risler, JL ;
Hénaut, A ;
Slonimski, PP ;
Codani, JJ .
COMPUTERS & CHEMISTRY, 1999, 23 (3-4) :317-331