DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches

被引:150
作者
Thompson, JD [1 ]
Plewniak, F [1 ]
Thierry, JC [1 ]
Poch, O [1 ]
机构
[1] ULP, INSERM, CNRS, Inst Genet & Biol Mol & Cellulaire,Lab Biol & Gen, F-67404 Illkirch Graffenstaden, France
关键词
D O I
10.1093/nar/28.15.2919
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
DbClustal addresses the important problem of the automatic multiple alignment of the top scoring full-length sequences detected by a database homology search. By combining the advantages of both local and global alignment algorithms into a single system, DbClustal is able to provide accurate global alignments of highly divergent, complex sequence sets. Local alignment information is incorporated into a ClustalW global alignment in the form of a list of anchor points between pairs of sequences. The method is demonstrated using anchors supplied by the Blast post-processing program, Ballast. The rapidity and reliability of DbClustal have been demonstrated using the recently annotated Pyrococcus abyssi proteome where the number of alignments with totally misaligned sequences was reduced from 20% to <2%. A web site has been implemented proposing BlastP database searches with automatic alignment of the top hits by DbClustal.
引用
收藏
页码:2919 / 2926
页数:8
相关论文
共 26 条
[11]   Multiple DNA and protein sequence alignment based on segment-to-segment comparison [J].
Morgenstern, B ;
Dress, A ;
Werner, T .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (22) :12098-12103
[12]   A GENERAL METHOD APPLICABLE TO SEARCH FOR SIMILARITIES IN AMINO ACID SEQUENCE OF 2 PROTEINS [J].
NEEDLEMAN, SB ;
WUNSCH, CD .
JOURNAL OF MOLECULAR BIOLOGY, 1970, 48 (03) :443-+
[13]   Extracting protein alignment models from the sequence database [J].
Neuwald, AF ;
Liu, JS ;
Lipman, DJ ;
Lawrence, CE .
NUCLEIC ACIDS RESEARCH, 1997, 25 (09) :1665-1677
[14]   COFFEE: An objective function for multiple sequence alignments [J].
Notredame, C ;
Holm, L ;
Higgins, DG .
BIOINFORMATICS, 1998, 14 (05) :407-422
[15]  
Pearson W R, 2000, Methods Mol Biol, V132, P185
[16]  
PLEWNIAK F, 2000, IN PRESS BIOINFORMAT, V16
[17]   DATABASE OF HOMOLOGY-DERIVED PROTEIN STRUCTURES AND THE STRUCTURAL MEANING OF SEQUENCE ALIGNMENT [J].
SANDER, C ;
SCHNEIDER, R .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1991, 9 (01) :56-68
[18]   A WORKBENCH FOR MULTIPLE ALIGNMENT CONSTRUCTION AND ANALYSIS [J].
SCHULER, GD ;
ALTSCHUL, SF ;
LIPMAN, DJ .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1991, 9 (03) :180-190
[19]   PIR-ALN: a database of protein sequence alignments [J].
Srinivasarao, GY ;
Yeh, LSL ;
Marzec, CR ;
Orcutt, BC ;
Barker, WC .
BIOINFORMATICS, 1999, 15 (05) :382-390
[20]   Dynamic sequence databank searching with templates and multiple alignment [J].
Taylor, WR .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 280 (03) :375-406