From next-generation sequencing alignments to accurate comparison and validation of single-nucleotide variants: the pibase software

被引:24
作者
Forster, Michael [1 ]
Forster, Peter [2 ,3 ]
Elsharawy, Abdou [1 ]
Hemmrich, Georg [1 ]
Kreck, Benjamin [1 ]
Wittig, Michael [1 ]
Thomsen, Ingo [1 ]
Stade, Bjoern [1 ]
Barann, Matthias [1 ]
Ellinghaus, David [1 ]
Petersen, Britt-Sabina [1 ]
May, Sandra [1 ]
Melum, Espen [4 ,5 ]
Schilhabel, Markus B. [1 ]
Keller, Andreas [6 ]
Schreiber, Stefan [1 ]
Rosenstiel, Philip [1 ]
Franke, Andre [1 ]
机构
[1] Univ Kiel, Inst Clin Mol Biol, D-24105 Kiel, Germany
[2] Inst Forens Genet, D-48161 Munster, Germany
[3] Univ Cambridge, Murray Edwards Coll, Cambridge CB3 0DF, England
[4] Harvard Univ, Sch Med, Brigham & Womens Hosp, Div Gastroenterol Hepatol & Endoscopy, Boston, MA 02115 USA
[5] Univ Oslo, Rikshosp, Oslo Univ Hosp, Clin specialized Med & Surg,Norwegian PSC Res Ctr, N-0027 Oslo, Norway
[6] Univ Saarland, Dept Human Genet, D-66123 Saarbrucken, Germany
关键词
GENOME; DISCOVERY; FRAMEWORK; TOOL; ASSOCIATION; ORIGIN; FORMAT;
D O I
10.1093/nar/gks836
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Scientists working with single-nucleotide variants (SNVs), inferred by next-generation sequencing software, often need further information regarding true variants, artifacts and sequence coverage gaps. In clinical diagnostics, e. g. SNVs must usually be validated by visual inspection or several independent SNV-callers. We here demonstrate that 0.5-60% of relevant SNVs might not be detected due to coverage gaps, or might be misidentified. Even low error rates can overwhelm the true biological signal, especially in clinical diagnostics, in research comparing healthy with affected cells, in archaeogenetic dating or in forensics. For these reasons, we have developed a package called pibase, which is applicable to diploid and haploid genome, exome or targeted enrichment data. pibase extracts details on nucleotides from alignment files at user-specified coordinates and identifies reproducible genotypes, if present. In test cases pibase identifies genotypes at 99.98% specificity, 10-fold better than other tools. pibase also provides pair-wise comparisons between healthy and affected cells using nucleotide signals (10-fold more accurately than a genotype-based approach, as we show in our case study of monozygotic twins). This comparison tool also solves the problem of detecting allelic imbalance within heterozygous SNVs in copy number variation loci, or in heterogeneous tumor sequences.
引用
收藏
页数:12
相关论文
共 43 条
[1]   SEQUENCE AND ORGANIZATION OF THE HUMAN MITOCHONDRIAL GENOME [J].
ANDERSON, S ;
BANKIER, AT ;
BARRELL, BG ;
DEBRUIJN, MHL ;
COULSON, AR ;
DROUIN, J ;
EPERON, IC ;
NIERLICH, DP ;
ROE, BA ;
SANGER, F ;
SCHREIER, PH ;
SMITH, AJH ;
STADEN, R ;
YOUNG, IG .
NATURE, 1981, 290 (5806) :457-465
[2]   Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA [J].
Andrews, RM ;
Kubacka, I ;
Chinnery, PF ;
Lightowlers, RN ;
Turnbull, DM ;
Howell, N .
NATURE GENETICS, 1999, 23 (02) :147-147
[3]  
[Anonymous], NATURE, V467, P1061
[4]   Current Next Generation Sequencing technology may not meet forensic standards [J].
Bandelt, Hans-Juergen ;
Salas, Antonio .
FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2012, 6 (01) :143-145
[5]   Median-joining networks for inferring intraspecific phylogenies [J].
Bandelt, HJ ;
Forster, P ;
Röhl, A .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (01) :37-48
[6]   A statistical method for the detection of variants from next-generation resequencing of DNA pools [J].
Bansal, Vikas .
BIOINFORMATICS, 2010, 26 (12) :i318-i324
[7]   DOUBLE MINUTES IN HUMAN-TUMOR CELLS [J].
BARKER, PE .
CANCER GENETICS AND CYTOGENETICS, 1982, 5 (01) :81-94
[8]  
Blankenberg D., 2010, CURR PROTOC MOL BIOL, P1
[9]   Manipulation of FASTQ data with Galaxy [J].
Blankenberg, Daniel ;
Gordon, Assaf ;
Von Kuster, Gregory ;
Coraor, Nathan ;
Taylor, James ;
Nekrutenko, Anton .
BIOINFORMATICS, 2010, 26 (14) :1783-1785
[10]   The variant call format and VCFtools [J].
Danecek, Petr ;
Auton, Adam ;
Abecasis, Goncalo ;
Albers, Cornelis A. ;
Banks, Eric ;
DePristo, Mark A. ;
Handsaker, Robert E. ;
Lunter, Gerton ;
Marth, Gabor T. ;
Sherry, Stephen T. ;
McVean, Gilean ;
Durbin, Richard .
BIOINFORMATICS, 2011, 27 (15) :2156-2158