MAFFT-DASH: integrated protein sequence and structural alignment

被引:660
作者
Rozewicki, John [1 ,2 ]
Li, Songling [1 ,2 ]
Amada, Karlou Mar [2 ,3 ]
Standley, Daron M. [1 ,2 ]
Katoh, Kazutaka [1 ,2 ]
机构
[1] Osaka Univ, Microbial Dis Res Inst, Dept Genome Informat, Genome Informat Res Ctr, 3-1 Yamadaoka, Suita, Osaka 5650871, Japan
[2] Osaka Univ, Immunol Frontier Res Ctr, Syst Immunol Lab, 3-1 Yamadaoka, Suita, Osaka 5650871, Japan
[3] TILI IO PTY LTD, L7 380 Docklands Dr, Docklands, Vic 3008, Australia
关键词
MULTIPLE; ACCURACY; DATABASE; PROGRAM;
D O I
10.1093/nar/gkz342
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Here, we describe a web server that integrates structural alignments with the MAFFT multiple sequence alignment (MSA) tool. For this purpose, we have prepared a web-based Database of Aligned Structural Homologs (DASH), which provides structural alignments at the domain and chain levels for all proteins in the Protein Data Bank (PDB), and can be queried interactively or by a simple REST-like API. MAFFT-DASH integration can be invoked with a single flag on either the web (https://mafft.cbrc.jp/alignment/server/) or command-line versions of MAFFT. In our benchmarks using 878 cases from the BAliBase, HomFam, OXFam, Mattbench and SISYPHUS datasets, MAFFT-DASH showed 10-20% improvement over standard MAFFT for MSA problems with weak similarity, in terms of Sum-of-Pairs (SP), a measure of how well a program succeeds at aligning input sequences in comparison to a reference alignment. When MAFFT alignments were supplemented with homologous sequences, further improvement was observed. Potential applications of DASH beyond MSA enrichment include functional annotation through detection of remote homology and assembly of template libraries for homology modeling.
引用
收藏
页码:W5 / W10
页数:6
相关论文
共 33 条
[1]   PDP: protein domain parser [J].
Alexandrov, N ;
Shindyalov, I .
BIOINFORMATICS, 2003, 19 (03) :429-430
[2]   SISYPHUS - structural alignments for proteins with non-trivial relationships [J].
Andreeva, Antonina ;
Prlic, Andreas ;
Hubbard, Tim J. P. ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D253-D259
[3]  
[Anonymous], 2018, P 13 CRITICAL ASSESS
[4]   The iRMSD: a local measure of sequence alignment accuracy using structural information [J].
Armougom, Fabrice ;
Moretti, Sebastien ;
Keduas, Vladimir ;
Notredame, Cedric .
BIOINFORMATICS, 2006, 22 (14) :E35-E39
[5]   Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-coffee [J].
Armougom, Fabrice ;
Moretti, Sebastien ;
Poirot, Olivier ;
Audic, Stephane ;
Dumas, Pierre ;
Schaeli, Basile ;
Keduas, Vladimir ;
Notredame, Cedric .
NUCLEIC ACIDS RESEARCH, 2006, 34 :W604-W608
[6]   BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations [J].
Bahr, A ;
Thompson, JD ;
Thierry, JC ;
Poch, O .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :323-326
[7]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10
[8]   Touring Protein Space with Matt [J].
Daniels, Noah M. ;
Kumar, Anoop ;
Cowen, Lenore J. ;
Menke, Matt .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (01) :286-293
[9]   Emerging methods in protein co-evolution [J].
de Juan, David ;
Pazos, Florencio ;
Valencia, Alfonso .
NATURE REVIEWS GENETICS, 2013, 14 (04) :249-261
[10]   T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension [J].
Di Tommaso, Paolo ;
Moretti, Sebastien ;
Xenarios, Ioannis ;
Orobitg, Miquel ;
Montanyola, Alberto ;
Chang, Jia-Ming ;
Taly, Jean-Francois ;
Notredame, Cedric .
NUCLEIC ACIDS RESEARCH, 2011, 39 :W13-W17