Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis

被引:112
作者
Guo, Yan [1 ]
Dai, Yulin [1 ]
Yu, Hui [1 ]
Zhao, Shilin [1 ]
Samuels, David C. [2 ]
Shyr, Yu [3 ]
机构
[1] Vanderbilt Univ, Dept Canc Biol, 221 Kirkland Hall, Nashville, TN 37235 USA
[2] Vanderbilt Univ, Sch Med, Vanderbilt Genet Inst, Dept Mol Physiol & Biophys, Nashville, TN 37212 USA
[3] Vanderbilt Univ, Dept Biostat, 221 Kirkland Hall, Nashville, TN 37235 USA
关键词
Human reference genome; GRCh37; GRCh38; High throughput sequencing; SNP; Copy number variation; Structural variant; COPY NUMBER VARIATIONS; MITOCHONDRIAL-DNA; HUMAN-CHROMOSOMES; QUALITY-CONTROL; RNA-SEQ; GENOME; EXOME; PRACTICABILITY; DISCOVERY; ASSEMBLER;
D O I
10.1016/j.ygeno.2017.01.005
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Analyses of high throughput sequencing data starts with alignment against a reference genome, which is the foundation for all re- sequencing data analyses. Each new release of the human reference genome has been augmented with improved accuracy and completeness. It is presumed that the latest release of human reference genome, GRCh38 will contribute more to high throughput sequencing data analysis by providing more accuracy. But the amount of improvement has not yet been quantified. We conducted a study to compare the genomic analysis results between the GRCh38 reference and its predecessor GRCh37. Through analyses of alignment, single nucleotide polymorphisms, small insertion/deletions, copy number and structural variants, we show that GRCh38 offers overall more accurate analysis of human sequencing data. More importantly, GRCh38 produced fewer false positive structural variants. In conclusion, GRCh38 is an improvement over GRCh37 not only from the genome assembly aspect, but also yields more reliable genomic analysis results. (C) 2017 Published by Elsevier Inc.
引用
收藏
页码:83 / 90
页数:8
相关论文
共 40 条
[1]   Detection of structural DNA variation from next generation sequencing data: a review of informatic approaches [J].
Abel, Haley J. ;
Duncavage, Eric J. .
CANCER GENETICS, 2013, 206 (12) :432-440
[2]   Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA [J].
Andrews, RM ;
Kubacka, I ;
Chinnery, PF ;
Lightowlers, RN ;
Turnbull, DM ;
Howell, N .
NATURE GENETICS, 1999, 23 (02) :147-147
[3]   3′ tag digital gene expression profiling of human brain and universal reference RNA using Illumina Genome Analyzer [J].
Asmann, Yan W. ;
Klee, Eric W. ;
Thompson, E. Aubrey ;
Perez, Edith A. ;
Middha, Sumit ;
Oberg, Ann L. ;
Therneau, Terry M. ;
Smith, David I. ;
Poland, Gregory A. ;
Wieben, Eric D. ;
Kocher, Jean-Pierre A. .
BMC GENOMICS, 2009, 10 :531
[4]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[5]   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing [J].
Berlin, Konstantin ;
Koren, Sergey ;
Chin, Chen-Shan ;
Drake, James P. ;
Landolin, Jane M. ;
Phillippy, Adam M. .
NATURE BIOTECHNOLOGY, 2015, 33 (06) :623-+
[6]  
Chen K, 2009, NAT METHODS, V6, P677, DOI [10.1038/NMETH.1363, 10.1038/nmeth.1363]
[7]   Stem cell transcriptome profiling via massive-scale mRNA sequencing [J].
Cloonan, Nicole ;
Forrest, Alistair R. R. ;
Kolle, Gabriel ;
Gardiner, Brooke B. A. ;
Faulkner, Geoffrey J. ;
Brown, Mellissa K. ;
Taylor, Darrin F. ;
Steptoe, Anita L. ;
Wani, Shivangi ;
Bethel, Graeme ;
Robertson, Alan J. ;
Perkins, Andrew C. ;
Bruce, Stephen J. ;
Lee, Clarence C. ;
Ranade, Swati S. ;
Peckham, Heather E. ;
Manning, Jonathan M. ;
McKernan, Kevin J. ;
Grimmond, Sean M. .
NATURE METHODS, 2008, 5 (07) :613-619
[8]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[9]  
Guo Y., 2014, BMC BIOINFORMATICS, V15
[10]   Detection of internal exon deletion with exon Del [J].
Guo, Yan ;
Zhao, Shilin ;
Lehmann, Brian D. ;
Sheng, Quanhu ;
Shaver, Timothy M. ;
Stricker, Thomas P. ;
Pietenpol, Jennifer A. ;
Shyr, Yu .
BMC BIOINFORMATICS, 2014, 15