Nanopore sequencing and assembly of a human genome with ultra-long reads

被引:1187
作者
Jain, Miten [1 ]
Koren, Sergey [2 ]
Miga, Karen H. [1 ]
Quick, Josh [3 ]
Rand, Arthur C. [1 ]
Sasani, Thomas A. [4 ,5 ]
Tyson, John R. [6 ,7 ]
Beggs, Andrew D. [8 ]
Dilthey, Alexander T. [2 ]
Fiddes, Ian T. [1 ]
Malla, Sunir [9 ]
Marriott, Hannah [9 ]
Nieto, Tom [8 ]
O'Grady, Justin [10 ]
Olsen, Hugh E. [1 ]
Pedersen, Brent S. [5 ]
Rhie, Arang
Richardson, Hollian [10 ]
Quinlan, Aaron R. [4 ,5 ,11 ]
Snutch, Terrance P. [6 ,7 ]
Tee, Louise [8 ]
Paten, Benedict [1 ]
Phillippy, Adam M. [2 ]
Simpson, Jared T. [12 ,13 ]
Loman, Nicholas J. [3 ]
Loose, Matthew [9 ]
机构
[1] Univ Calif Santa Cruz, Genom Inst, Santa Cruz, CA 95064 USA
[2] NHGRI, Computat & Stat Genom Branch, Genorne Informat Sect, Bethesda, MD 20892 USA
[3] Univ Birmingham, Inst Microbiol & Infect, Birmingham, W Midlands, England
[4] Univ Utah, Dept Human Genet, Salt Lake City, UT USA
[5] Univ Utah, USTAR Ctr Genet Discovery, Salt Lake City, UT USA
[6] Univ British Columbia, Michael Smith Labs, Vancouver, BC, Canada
[7] Univ British Columbia, Djavad Mowafaghian Ctr Brain Hlth, Vancouver, BC, Canada
[8] Univ Birmingham, Inst Canc & Genom Sci, Surg Res Lab, Birmingham, W Midlands, England
[9] Univ Nottingham, Sch Life Sci, DeepSeq, Nottingham, England
[10] Univ East Anglia, Norwich Med Sch, Norwich, Norfolk, England
[11] Univ Utah, Dept Biomed Informat, Salt Lake City, UT USA
[12] Inst Canc Res, Toronto, ON, Canada
[13] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
基金
美国国家卫生研究院; 英国生物技术与生命科学研究理事会; 英国惠康基金; 加拿大健康研究院;
关键词
DNA; VARIANTS; METHYLATION; LENGTH; GENES; GAPS; TIME;
D O I
10.1038/nbt.4060
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
We report the sequencing and assembly of a reference genome for the human GM12878 Utah/Ceph cell line using the MinION (Oxford Nanopore Technologies) nanopore sequencer. 91.2 Gb of sequence data, representing similar to 30x theoretical coverage, were produced. Reference-based alignment enabled detection of large structural variants and epigenetic modifications. De novo assembly of nanopore reads alone yielded a contiguous assembly (NG50 similar to 3 Mb). We developed a protocol to generate ultra-long reads (N50 > 100 kb, read lengths up to 882 kb). Incorporating an additional 5x coverage of these ultra-long reads more than doubled the assembly contiguity (NG50 similar to 6.4 Mb). The final assembled genome was 2,867 million bases in size, covering 85.8% of the reference. Assembly accuracy, after incorporating complementary short-read sequencing data, exceeded 99.8%. Ultra-long reads enabled assembly and phasing of the 4-Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere repeat length, and closure of gaps in the reference human genome assembly GRCh38.
引用
收藏
页码:338 / +
页数:13
相关论文
共 68 条
[1]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[2]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[3]   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing [J].
Berlin, Konstantin ;
Koren, Sergey ;
Chin, Chen-Shan ;
Drake, James P. ;
Landolin, Jane M. ;
Phillippy, Adam M. .
NATURE BIOTECHNOLOGY, 2015, 33 (06) :623-+
[4]   Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome [J].
Bickhart, Derek M. ;
Rosen, Benjamin D. ;
Koren, Sergey ;
Sayre, Brian L. ;
Hastie, Alex R. ;
Chan, Saki ;
Lee, Joyce ;
Lam, Ernest T. ;
Liachko, Ivan ;
Sullivan, Shawn T. ;
Burton, Joshua N. ;
Huson, Heather J. ;
Nystrom, John C. ;
Kelley, Christy M. ;
Hutchison, Jana L. ;
Zhou, Yang ;
Sun, Jiajie ;
Crisa, Alessandra ;
de Leon, F. Abel Ponce ;
Schwartz, John C. ;
Hammond, John A. ;
Waldbieser, Geoffrey C. ;
Schroeder, Steven G. ;
Liu, George E. ;
Dunham, Maitreya J. ;
Shendure, Jay ;
Sonstegard, Tad S. ;
Phillippy, Adam M. ;
Van Tassell, Curtis P. ;
Smith, Timothy P. L. .
NATURE GENETICS, 2017, 49 (04) :643-+
[5]  
Bohringer S., 2002, ONL J BIOINFORM, V1, P51
[6]   Closing gaps in the human genome with fosmid resources generated from multiple individuals (Reprinted from Nature Genetics, vol 40, pg 96-101, 2008) [J].
Bovee, Donald ;
Zhou, Yang ;
Haugen, Eric ;
Wu, Zaining ;
Hayden, Hillary S. ;
Gillett, Will ;
Tuzun, Eray ;
Cooper, Gregory M. ;
Sampas, Nick ;
Phelps, Karen ;
Levy, Ruth ;
Morrison, V. Anne ;
Sprague, James ;
Jewett, Donald ;
Buckley, Danielle ;
Subramaniam, Sandhya ;
Chang, Jean ;
Smith, Douglas R. ;
Olson, Maynard V. ;
Eichler, Evan E. ;
Kaul, Rajinder .
NATURE GENETICS, 2009, :S31-S36
[7]   Optimal assembly for high throughput shotgun sequencing [J].
Guy Bresler ;
Ma'ayan Bresler ;
David Tse .
BMC Bioinformatics, 14 (Suppl 5)
[8]   Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory [J].
Chaisson, Mark J. ;
Tesler, Glenn .
BMC BIOINFORMATICS, 2012, 13
[9]   APPLICATIONS OF NEXT-GENERATION SEQUENCING Genetic variation and the de novo assembly of human genomes [J].
Chaisson, Mark J. P. ;
Wilson, Richard K. ;
Eichler, Evan E. .
NATURE REVIEWS GENETICS, 2015, 16 (11) :627-640
[10]   Identification of a new cancer/testis gene family, CT47, among expressed multicopy genes on the human X chromosome [J].
Chen, YT ;
Iseli, C ;
Venditti, CA ;
Old, LJ ;
Simpson, AJG ;
Jongeneel, CV .
GENES CHROMOSOMES & CANCER, 2006, 45 (04) :392-400