An Eulerian path approach to DNA fragment assembly

被引:769
作者
Pevzner, PA
Tang, HX
Waterman, MS [1 ]
机构
[1] Univ So Calif, Dept Math, Los Angeles, CA 90089 USA
[2] Univ So Calif, Dept Biol Sci, Los Angeles, CA 90089 USA
[3] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
关键词
D O I
10.1073/pnas.171285098
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
For the last 20 years, fragment assembly in DNA sequencing followed the "overlap-layout-consensus" paradigm that is used in all currently available assembly tools. Although this approach proved useful in assembling clones, it faces difficulties in genomic shotgun assembly. We abandon the classical "overlap-layout-consensus" approach in favor of a new EULER algorithm that, for the first time, resolves the 20-year-old "repeat problem" in fragment assembly. Our main result is the reduction of the fragment assembly to a variation of the classical Eulerian path problem that allows one to generate accurate solutions of large-scale sequencing problems. EULER, in contrast to the CELERA assembler, does not mask such repeats but uses them instead as a powerful fragment assembly tool.
引用
收藏
页码:9748 / 9753
页数:6
相关论文
共 27 条
[1]  
[Anonymous], 2001, RECOMB
[2]   Low-redundancy sequencing of the entire Lactococcus lactis IL1403 genome [J].
Bolotin, A ;
Mauger, S ;
Malarme, K ;
Ehrlich, SD ;
Sorokin, A .
ANTONIE VAN LEEUWENHOEK INTERNATIONAL JOURNAL OF GENERAL AND MOLECULAR MICROBIOLOGY, 1999, 76 (1-4) :27-76
[3]   A new DNA sequence assembly program [J].
Bonfield, JK ;
Smith, KF ;
Staden, R .
NUCLEIC ACIDS RESEARCH, 1995, 23 (24) :4992-4999
[4]   THE ACCURACY OF DNA-SEQUENCES - ESTIMATING SEQUENCE QUALITY [J].
CHURCHILL, GA ;
WATERMAN, MS .
GENOMICS, 1992, 14 (01) :89-98
[5]  
DRMANAC R, 1989, GENOMICS, V4, P114
[6]   WHOLE-GENOME RANDOM SEQUENCING AND ASSEMBLY OF HAEMOPHILUS-INFLUENZAE RD [J].
FLEISCHMANN, RD ;
ADAMS, MD ;
WHITE, O ;
CLAYTON, RA ;
KIRKNESS, EF ;
KERLAVAGE, AR ;
BULT, CJ ;
TOMB, JF ;
DOUGHERTY, BA ;
MERRICK, JM ;
MCKENNEY, K ;
SUTTON, G ;
FITZHUGH, W ;
FIELDS, C ;
GOCAYNE, JD ;
SCOTT, J ;
SHIRLEY, R ;
LIU, LI ;
GLODEK, A ;
KELLEY, JM ;
WEIDMAN, JF ;
PHILLIPS, CA ;
SPRIGGS, T ;
HEDBLOM, E ;
COTTON, MD ;
UTTERBACK, TR ;
HANNA, MC ;
NGUYEN, DT ;
SAUDEK, DM ;
BRANDON, RC ;
FINE, LD ;
FRITCHMAN, JL ;
FUHRMANN, JL ;
GEOGHAGEN, NSM ;
GNEHM, CL ;
MCDONALD, LA ;
SMALL, KV ;
FRASER, CM ;
SMITH, HO ;
VENTER, JC .
SCIENCE, 1995, 269 (5223) :496-512
[7]  
Fleischner H, 1990, Eulerian Graphs and Related Topics, Annals of Discrete Mathematics
[8]   Consed: A graphical tool for sequence finishing [J].
Gordon, D ;
Abajian, C ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :195-202
[9]  
Green P, 1994, DOCUMENTATION PHRAP
[10]   CAP3: A DNA sequence assembly program [J].
Huang, XQ ;
Madan, A .
GENOME RESEARCH, 1999, 9 (09) :868-877