De novo assembly and phasing of a Korean human genome

被引:223
作者
Seo, Jeong-Sun [1 ,2 ,3 ,4 ,5 ]
Rhie, Arang [1 ,2 ,3 ]
Kim, Junsoo [1 ,4 ]
Lee, Sangjin [1 ,5 ]
Sohn, Min-Hwan [1 ,2 ,3 ]
Kim, Chang-Uk [1 ,2 ,3 ]
Hastie, Alex
Cao, Han [6 ]
Yun, Ji-Young [1 ,5 ]
Kim, Jihye [1 ,5 ]
Kuk, Junho [1 ,5 ]
Park, Gun Hwa [1 ,5 ]
Kim, Juhyeok [1 ,5 ]
Ryu, Hanna [4 ]
Kim, Jongbum [4 ]
Roh, Mira [4 ]
Baek, Jeonghun [4 ]
Hunkapiller, Michael W. [7 ]
Korlach, Jonas [7 ]
Shin, Jong-Yeon [1 ]
Kim, Changhoon [4 ]
机构
[1] Seoul Natl Univ, Med Res Ctr, GMI, Seoul 110799, South Korea
[2] Seoul Natl Univ, Coll Med, Dept Biochem & Mol Biol, Seoul 110799, South Korea
[3] Seoul Natl Univ, Grad Sch, Dept Biomed Sci, Seoul 110799, South Korea
[4] Macrogen Inc, Bioinformat Inst, Seoul 153023, South Korea
[5] Macrogen Inc, Genome Inst, Seoul 153023, South Korea
[6] BioNano Genom, San Diego, CA 92121 USA
[7] Pacific Biosci Calif Inc, Menlo Pk, CA 94025 USA
关键词
STRUCTURAL VARIATION; SEQUENCE; MUTATIONS; ALIGNMENT; GENE;
D O I
10.1038/nature20098
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture of the human genome and reveal the full range of structural variation across population groups. Here we report the de novo assembly and haplotype phasing of the Korean individual AK1 (ref. 1) using single-molecule real-time sequencing(2), next-generation mapping(3), microfluidics-based linked reads(4), and bacterial artificial chromosome (BAC) sequencing approaches. Single-molecule sequencing coupled with next-generation mapping generated a highly contiguous assembly, with a contig N50 size of 17.9 Mb and a scaffold N50 size of 44.8 Mb, resolving 8 chromosomal arms into single scaffolds. The de novo assembly, along with local assemblies and spanning long reads, closes 105 and extends into 72 out of 190 euchromatic gaps in the reference genome, adding 1.03 Mb of previously intractable sequence. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provides strong support for the robustness of the assembly. We identify 18,210 structural variants by direct comparison of the assembly with the human reference, identifying thousands of breakpoints that, to our knowledge, have not been reported before. Many of the insertions are reflected in the transcriptome and are shared across the Asian population. We performed haplotype phasing of the assembly with short reads, long reads and linked reads from whole-genome sequencing and with short reads from 31,719 BAC clones, thereby achieving phased blocks with an N50 size of 11.6 Mb. Haplotigs assembled from single-molecule real-time reads assigned to haplotypes on phased blocks covered 89% of genes. The haplotigs accurately characterized the hypervariable major histocompatability complex region as well as demonstrating allele configuration in clinically relevant genes such as CYP2D6. This work presents the most contiguous diploid human genome assembly so far, with extensive investigation of unreported and Asian-specific structural variants, and high-quality haplotyping of clinically relevant alleles for precision medicine.
引用
收藏
页码:243 / +
页数:18
相关论文
共 33 条
[1]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[2]   Non-founder mutations in the MEFV gene establish this gene as the cause of familial Mediterranean fever (FMF) [J].
Bernot, A ;
da Silva, C ;
Petit, JL ;
Cruaud, C ;
Caloustian, C ;
Castet, V ;
Ahmed-Arab, M ;
Dross, C ;
Dupont, M ;
Cattan, D ;
Smaoui, N ;
Dodé, C ;
Pêcheux, C ;
Nédelec, B ;
Medaxian, J ;
Rozenbaum, M ;
Rosner, I ;
Delpech, M ;
Grateau, G ;
Demaille, J ;
Weissenbach, J ;
Touitou, I .
HUMAN MOLECULAR GENETICS, 1998, 7 (08) :1317-1325
[3]   De novo assembly of a haplotype-resolved human genome [J].
Cao, Hongzhi ;
Wu, Honglong ;
Luo, Ruibang ;
Huang, Shujia ;
Sun, Yuhui ;
Tong, Xin ;
Xie, Yinlong ;
Liu, Binghang ;
Yang, Hailong ;
Zheng, Hancheng ;
Li, Jian ;
Li, Bo ;
Wang, Yu ;
Yang, Fang ;
Sun, Peng ;
Liu, Siyang ;
Gao, Peng ;
Huang, Haodong ;
Sun, Jing ;
Chen, Dan ;
He, Guangzhu ;
Huang, Weihua ;
Huang, Zheng ;
Li, Yue ;
Tellier, Laurent C. A. M. ;
Liu, Xiao ;
Feng, Qiang ;
Xu, Xun ;
Zhang, Xiuqing ;
Bolund, Lars ;
Krogh, Anders ;
Kristiansen, Karsten ;
Drmanac, Radoje ;
Drmanac, Snezana ;
Nielsen, Rasmus ;
Li, Songgang ;
Wang, Jian ;
Yang, Huanming ;
Li, Yingrui ;
Wong, Gane Ka-Shu ;
Wang, Jun .
NATURE BIOTECHNOLOGY, 2015, 33 (06) :617-+
[4]   Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory [J].
Chaisson, Mark J. ;
Tesler, Glenn .
BMC BIOINFORMATICS, 2012, 13
[5]   APPLICATIONS OF NEXT-GENERATION SEQUENCING Genetic variation and the de novo assembly of human genomes [J].
Chaisson, Mark J. P. ;
Wilson, Richard K. ;
Eichler, Evan E. .
NATURE REVIEWS GENETICS, 2015, 16 (11) :627-640
[6]   Resolving the complexity of the human genome using single-molecule sequencing [J].
Chaisson, Mark J. P. ;
Huddleston, John ;
Dennis, Megan Y. ;
Sudmant, Peter H. ;
Malig, Maika ;
Hormozdiari, Fereydoun ;
Antonacci, Francesca ;
Surti, Urvashi ;
Sandstrom, Richard ;
Boitano, Matthew ;
Landolin, Jane M. ;
Stamatoyannopoulos, John A. ;
Hunkapiller, Michael W. ;
Korlach, Jonas ;
Eichler, Evan E. .
NATURE, 2015, 517 (7536) :608-U163
[7]  
Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/nmeth.2474, 10.1038/NMETH.2474]
[8]  
Costello J, 2015, GENOME BIOL, V16, DOI [10.1186/s13059-014-0559-z, 10.1186/s13059-015-0762-6]
[9]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[10]   Real-Time DNA Sequencing from Single Polymerase Molecules [J].
Eid, John ;
Fehr, Adrian ;
Gray, Jeremy ;
Luong, Khai ;
Lyle, John ;
Otto, Geoff ;
Peluso, Paul ;
Rank, David ;
Baybayan, Primo ;
Bettman, Brad ;
Bibillo, Arkadiusz ;
Bjornson, Keith ;
Chaudhuri, Bidhan ;
Christians, Frederick ;
Cicero, Ronald ;
Clark, Sonya ;
Dalal, Ravindra ;
deWinter, Alex ;
Dixon, John ;
Foquet, Mathieu ;
Gaertner, Alfred ;
Hardenbol, Paul ;
Heiner, Cheryl ;
Hester, Kevin ;
Holden, David ;
Kearns, Gregory ;
Kong, Xiangxu ;
Kuse, Ronald ;
Lacroix, Yves ;
Lin, Steven ;
Lundquist, Paul ;
Ma, Congcong ;
Marks, Patrick ;
Maxham, Mark ;
Murphy, Devon ;
Park, Insil ;
Pham, Thang ;
Phillips, Michael ;
Roy, Joy ;
Sebra, Robert ;
Shen, Gene ;
Sorenson, Jon ;
Tomaney, Austin ;
Travers, Kevin ;
Trulson, Mark ;
Vieceli, John ;
Wegener, Jeffrey ;
Wu, Dawn ;
Yang, Alicia ;
Zaccarin, Denis .
SCIENCE, 2009, 323 (5910) :133-138