Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology

被引:162
作者
Otto, Thomas D. [1 ]
Sanders, Mandy [1 ]
Berriman, Matthew [1 ]
Newbold, Chris [1 ,2 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[2] Univ Oxford, John Radcliffe Hosp, Weatherall Inst Mol Med, Oxford OX3 9DS, England
基金
英国惠康基金;
关键词
GENOME; SYSTEM;
D O I
10.1093/bioinformatics/btq269
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Results: Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications.
引用
收藏
页码:1704 / 1707
页数:4
相关论文
共 12 条
[1]   Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database [J].
Carver, Tim ;
Berriman, Matthew ;
Tivey, Adrian ;
Patel, Chinmay ;
Bohme, Ulrike ;
Barrell, Barclay G. ;
Parkhill, Julian ;
Rajandream, Marie-Adele .
BIOINFORMATICS, 2008, 24 (23) :2672-2676
[2]   The genome sequencer FLX™ system-longer reads, more applications, straight forward bioinformatics and more complete data sets [J].
Droege, Marcus ;
Hill, Brendon .
JOURNAL OF BIOTECHNOLOGY, 2008, 136 (1-2) :3-10
[3]   Increasing the coverage of a metapopulation consensus genome by iterative read mapping and assembly [J].
Dutilh, Bas E. ;
Huynen, Martijn A. ;
Strous, Marc .
BIOINFORMATICS, 2009, 25 (21) :2878-2881
[4]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[5]   Automated correction of genome sequence errors [J].
Gajer, P ;
Schatz, M ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 2004, 32 (02) :562-569
[6]   Sequence of Plasmodium falciparum chromosomes 1, 3-9 and 13 [J].
Hall, N ;
Pain, A ;
Berriman, M ;
Churcher, C ;
Harris, B ;
Harris, D ;
Mungall, K ;
Bowman, S ;
Atkin, R ;
Baker, S ;
Barron, A ;
Brooks, K ;
Buckee, CO ;
Burrows, C ;
Cherevach, I ;
Chillingworth, C ;
Chillingworth, T ;
Christodoulou, Z ;
Clark, L ;
Clark, R ;
Corton, C ;
Cronin, A ;
Davies, R ;
Davis, P ;
Dear, P ;
Dearden, F ;
Doggett, J ;
Feltwell, T ;
Goble, A ;
Goodhead, I ;
Gwilliam, R ;
Hamlin, N ;
Hance, Z ;
Harper, D ;
Hauser, H ;
Hornsby, T ;
Holroyd, S ;
Horrocks, P ;
Humphray, S ;
Jagels, K ;
James, KD ;
Johnson, D ;
Kerhornou, A ;
Knights, A ;
Konfortov, B ;
Kyes, S ;
Larke, N ;
Lawson, D ;
Lennard, N ;
Line, A .
NATURE, 2002, 419 (6906) :527-531
[7]  
Kozarewa I, 2009, NAT METHODS, V6, P291, DOI [10.1038/NMETH.1311, 10.1038/nmeth.1311]
[8]   Initial sequencing and analysis of the human genome [J].
Lander, ES ;
Int Human Genome Sequencing Consortium ;
Linton, LM ;
Birren, B ;
Nusbaum, C ;
Zody, MC ;
Baldwin, J ;
Devon, K ;
Dewar, K ;
Doyle, M ;
FitzHugh, W ;
Funke, R ;
Gage, D ;
Harris, K ;
Heaford, A ;
Howland, J ;
Kann, L ;
Lehoczky, J ;
LeVine, R ;
McEwan, P ;
McKernan, K ;
Meldrim, J ;
Mesirov, JP ;
Miranda, C ;
Morris, W ;
Naylor, J ;
Raymond, C ;
Rosetti, M ;
Santos, R ;
Sheridan, A ;
Sougnez, C ;
Stange-Thomann, N ;
Stojanovic, N ;
Subramanian, A ;
Wyman, D ;
Rogers, J ;
Sulston, J ;
Ainscough, R ;
Beck, S ;
Bentley, D ;
Burton, J ;
Clee, C ;
Carter, N ;
Coulson, A ;
Deadman, R ;
Deloukas, P ;
Dunham, A ;
Dunham, I ;
Durbin, R ;
French, L .
NATURE, 2001, 409 (6822) :860-921
[9]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858
[10]   SNP-o-matic [J].
Manske, Heinrich Magnus ;
Kwiatkowski, Dominic P. .
BIOINFORMATICS, 2009, 25 (18) :2434-2435