Automatic assessment of alignment quality

被引:95
作者
Lassmann, T [1 ]
Sonnhammer, ELL [1 ]
机构
[1] Karolinska Inst, Ctr Genom & Bioinformat, S-17177 Stockholm, Sweden
关键词
D O I
10.1093/nar/gki1020
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multiple sequence alignments play a central role in the annotation of novel genomes. Given the biological and computational complexity of this task, the automatic generation of high-quality alignments remains challenging. Since multiple alignments are usually employed at the very start of data analysis pipelines, it is crucial to ensure high alignment quality. We describe a simple, yet elegant, solution to assess the biological accuracy of alignments automatically. Our approach is based on the comparison of several alignments of the same sequences. We introduce two functions to compare alignments: the average overlap score and the multiple overlap score. The former identifies difficult alignment cases by expressing the similarity among several alignments, while the latter estimates the biological correctness of individual alignments. We implemented both functions in the MUMSA program and demonstrate the overall robustness and accuracy of both functions on three large benchmark sets.
引用
收藏
页码:7120 / 7128
页数:9
相关论文
共 38 条
[21]   Recent progress in multiple sequence alignment: a survey [J].
Notredame, C .
PHARMACOGENOMICS, 2002, 3 (01) :131-144
[22]   T-Coffee: A novel method for fast and accurate multiple sequence alignment [J].
Notredame, C ;
Higgins, DG ;
Heringa, J .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 302 (01) :205-217
[23]   APDB: a novel measure for benchmarking sequence alignment methods without reference alignments [J].
O'Sullivan, Orla ;
Zehnder, Mark ;
Higgins, Des ;
Bucher, Philipp ;
Grosdidier, Aurelien ;
Notredame, Cedric .
BIOINFORMATICS, 2003, 19 :i215-i221
[24]  
PEARSON WR, 1990, METHOD ENZYMOL, V183, P63
[25]   IMPROVED TOOLS FOR BIOLOGICAL SEQUENCE COMPARISON [J].
PEARSON, WR ;
LIPMAN, DJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (08) :2444-2448
[26]   AL2CO: calculation of positional conservation in a protein sequence alignment [J].
Pei, JM ;
Grishin, NV .
BIOINFORMATICS, 2001, 17 (08) :700-712
[27]   Tcoffee@igs: a web server for computing, evaluating and combining multiple sequence alignments [J].
Poirot, O ;
O'Toole, E ;
Notredame, C .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3503-3506
[28]   A novel method for multiple alignment of sequences with repeated and shuffled elements [J].
Raphael, B ;
Zhi, DG ;
Tang, HX ;
Pevzner, P .
GENOME RESEARCH, 2004, 14 (11) :2336-2346
[29]   BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs [J].
Thompson, JD ;
Plewniak, F ;
Poch, O .
BIOINFORMATICS, 1999, 15 (01) :87-88
[30]   BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark [J].
Thompson, JD ;
Koehl, P ;
Ripp, R ;
Poch, O .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 61 (01) :127-136