Automated sequence preprocessing in a large-scale sequencing environment

被引:29
作者
Wendl, MC [1 ]
Dear, S
Hodgson, D
Hillier, L
机构
[1] Washington Univ, Genome Sequencing Ctr, St Louis, MO 63108 USA
[2] Sanger Ctr, Cambridge CB10 1SA, England
来源
GENOME RESEARCH | 1998年 / 8卷 / 09期
基金
英国惠康基金;
关键词
D O I
10.1101/gr.8.9.975
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A software system for transforming fragments from four-color Fluorescence-based gel electrophoresis experiments into assembled sequence is described. It has been developed for large-scale processing of all trace data, including shotgun and finishing reads, regardless of clone origin. Design considerations are discussed in detail, as are programming implementation and graphic tools. The importance of input validation, record tracking, and use of base quality values is emphasized. Several quality analysis metrics are proposed and applied to sample results from recently sequenced clones. Such quantities prove to be a valuable aid in evaluating modifications of sequencing protocol. The system is in full production use at both the Genome Sequencing Center and the Sanger Centre, for which combined weekly production is similar to 100,000 sequencing reads per week.
引用
收藏
页码:975 / 984
页数:10
相关论文
共 21 条
[1]   A new DNA sequence assembly program [J].
Bonfield, JK ;
Smith, KF ;
Staden, R .
NUCLEIC ACIDS RESEARCH, 1995, 23 (24) :4992-4999
[2]   Experiment files and their application during large-scale sequencing projects [J].
Bonfield, JK ;
Staden, R .
DNA SEQUENCE, 1996, 6 (02) :109-117
[3]   Lane tracking software for four-color fluorescence-based electrophoretic gel images [J].
Cooper, ML ;
Maffitt, DR ;
Parsons, JD ;
Hillier, L ;
States, DJ .
GENOME RESEARCH, 1996, 6 (11) :1110-1117
[4]  
Dear S, 1992, DNA Seq, V3, P107, DOI 10.3109/10425179209034003
[5]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[6]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[7]   A TRACE DISPLAY AND EDITING PROGRAM FOR DATA FROM FLUORESCENCE BASED SEQUENCING MACHINES [J].
GLEESON, T ;
HILLIER, L .
NUCLEIC ACIDS RESEARCH, 1991, 19 (23) :6481-6483
[8]   Consed: A graphical tool for sequence finishing [J].
Gordon, D ;
Abajian, C ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :195-202
[9]  
Kreyszig E., 1988, ADV ENG MATH
[10]  
LARSON S, 1996, TR9604 U AR DEP COMP