Alignathon: a competitive assessment of whole-genome alignment methods

被引:70
作者
Earl, Dent [1 ,2 ]
Nguyen, Ngan [1 ,2 ]
Hickey, Glenn [3 ]
Harris, Robert S. [4 ]
Fitzgerald, Stephen [5 ]
Beal, Kathryn [5 ]
Seledtsov, Igor [6 ]
Molodtsov, Vladimir [6 ]
Raney, Brian J. [1 ]
Clawson, Hiram [1 ]
Kim, Jaebum [7 ]
Kemena, Carsten [8 ,9 ,10 ]
Chang, Jia-Ming [8 ,9 ,11 ]
Erb, Ionas [8 ,9 ]
Poliakov, Alexander [12 ]
Hou, Minmei [13 ]
Herrero, Javier [5 ,14 ]
Kent, William James [1 ,2 ]
Solovyev, Victor [6 ]
Darling, Aaron E. [15 ]
Ma, Jian [16 ,17 ]
Notredame, Cedric [8 ,9 ]
Brudno, Michael [18 ,19 ,20 ,21 ]
Dubchak, Inna [12 ,22 ]
Haussler, David [1 ,2 ,23 ]
Paten, Benedict [1 ,2 ]
机构
[1] Univ Calif Santa Cruz, Ctr Biomol Sci & Engn, Santa Cruz, CA 95064 USA
[2] Univ Calif Santa Cruz, Dept Biomol Engn, Santa Cruz, CA 95064 USA
[3] McGill Univ, Sch Comp Sci, Montreal, PQ H3A 0G4, Canada
[4] Penn State Univ, Dept Biol, University Pk, PA 16801 USA
[5] European Bioinformat Inst, European Mol Biol Lab, Cambridge CB10 1SD, England
[6] Softberry Inc, Mt Kisco, NY 10549 USA
[7] Konkuk Univ, Dept Anim Biotechnol, Seoul 143701, South Korea
[8] Ctr Genom Regulat CRG, Barcelona 08003, Spain
[9] UPF, Barcelona 08003, Spain
[10] Univ Munster, Inst Evolut & Biodivers, D-48149 Munster, Germany
[11] CNRS, UPR 1142, Inst Human Genet IGH, Montpellier, France
[12] Dept Energy Joint Genome Inst, Walnut Creek, CA 94598 USA
[13] No Illinois Univ, Dept Comp Sci, De Kalb, IL 60115 USA
[14] Genome Anal Ctr, Norwich NR4 7UH, Norfolk, England
[15] Univ Technol Sydney, I Inst 3, Sydney, NSW 2007, Australia
[16] Univ Illinois, Dept Bioengn, Urbana, IL 61801 USA
[17] Univ Illinois, Inst Genom Biol, Urbana, IL 61801 USA
[18] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4, Canada
[19] Univ Toronto, Donnelly Ctr, Toronto, ON M5S 3G4, Canada
[20] Hosp Sick Children, Ctr Computat Med, Toronto, ON M5G 1X8, Canada
[21] Hosp Sick Children, Genet & Genome Biol Program, Toronto, ON M5G 1X8, Canada
[22] Lawrence Berkeley Natl Lab, Berkeley, CA 94710 USA
[23] Howard Hughes Med Inst, Chevy Chase, MD 20815 USA
基金
欧洲研究理事会; 美国国家科学基金会; 英国惠康基金;
关键词
MULTIPLE ALIGNMENT; SEQUENCE; SIMULATION; RELIABILITY; BENCHMARKS; CHALLENGES; FRAMEWORK; GENES; TOOLS; SIZE;
D O I
10.1101/gr.174920.114
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.
引用
收藏
页码:2077 / 2089
页数:13
相关论文
共 63 条
[1]   Mugsy: fast multiple alignment of closely related whole genomes [J].
Angiuoli, Samuel V. ;
Salzberg, Steven L. .
BIOINFORMATICS, 2011, 27 (03) :334-342
[2]  
[Anonymous], 2012, ROBUSTA METAMULTIPLE
[3]  
[Anonymous], BIOINFORMATICS GENOM
[4]  
[Anonymous], 2009, Evolver
[5]  
[Anonymous], THESIS ILLINOIS I TE
[6]  
[Anonymous], 2010, RepeatMasker Open-3.0. 1996-2010
[7]   A simulation test bed for hypotheses of genome evolution [J].
Beiko, Robert G. ;
Charlebois, Robert L. .
BIOINFORMATICS, 2007, 23 (07) :825-831
[8]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[9]  
Blackshields Gordon, 2006, In Silico Biol, V6, P321
[10]   Aligning multiple genomic sequences with the threaded blockset aligner [J].
Blanchette, M ;
Kent, WJ ;
Riemer, C ;
Elnitski, L ;
Smit, AFA ;
Roskin, KM ;
Baertsch, R ;
Rosenbloom, K ;
Clawson, H ;
Green, ED ;
Haussler, D ;
Miller, W .
GENOME RESEARCH, 2004, 14 (04) :708-715