De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data

被引:35
作者
Ameur, Adam [1 ]
Che, Huiwen [1 ]
Martin, Marcel [2 ]
Bunikis, Ignas [1 ]
Dahlberg, Johan [3 ]
Hoijer, Ida [1 ]
Haggqvist, Susana [1 ]
Vezzi, Francesco [2 ]
Nordlund, Jessica [3 ]
Olason, Pall [4 ]
Feuk, Lars [1 ]
Gyllensten, Ulf [1 ]
机构
[1] Uppsala Univ, Dept Immunol Genet & Pathol, Sci Life Lab, S-75236 Uppsala, Sweden
[2] Stockholm Univ, DBB, Sci Life Lab, S-11419 Stockholm, Sweden
[3] Uppsala Univ, Dept Med Sci, Sci Life Lab, Mol Med, S-75236 Uppsala, Sweden
[4] Uppsala Univ, Dept Cell & Mol Biol, Sci Life Lab, S-75236 Uppsala, Sweden
基金
瑞典研究理事会;
关键词
de novo assembly; SMRT sequencing; GRCh38; human reference genome; human whole-genome sequencing; population sequencing; Swedish population;
D O I
10.3390/genes9100486
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.
引用
收藏
页数:16
相关论文
共 36 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[3]   Single-Molecule Sequencing: Towards Clinical Applications [J].
Ameur, Adam ;
Kloosterman, Wigard P. ;
Hestand, Matthew S. .
TRENDS IN BIOTECHNOLOGY, 2019, 37 (01) :72-85
[4]   SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population [J].
Ameur, Adam ;
Dahlberg, Johan ;
Olason, Pall ;
Vezzi, Francesco ;
Karlsson, Robert ;
Martin, Marcel ;
Viklund, Johan ;
Kahari, Andreas Kusalananda ;
Lundin, Par ;
Che, Huiwen ;
Thutkawkorapin, Jessada ;
Eisfeldt, Jesper ;
Lampa, Samuel ;
Dahlberg, Mats ;
Hagberg, Jonas ;
Jareborg, Niclas ;
Liljedahl, Ulrika ;
Jonasson, Inger ;
Johansson, Asa ;
Feuk, Lars ;
Lundeberg, Joakim ;
Syvanen, Ann-Christine ;
Lundin, Sverker ;
Nilsson, Daniel ;
Nystedt, Bjorn ;
Magnusson, Patrik K. E. ;
Gyllensten, Ulf .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2017, 25 (11) :1253-1260
[5]   The genome of the sparganosis tapeworm Spirometra erinaceieuropaei isolated from the biopsy of a migrating brain lesion [J].
Bennett, Hayley M. ;
Mok, Hoi Ping ;
Gkrania-Klotsas, Effrossyni ;
Tsai, Isheng J. ;
Stanley, Eleanor J. ;
Antoun, Nagui M. ;
Coghlan, Avril ;
Harsha, Bhavana ;
Traini, Alessandra ;
Ribeiro, Diogo M. ;
Steinbiss, Sascha ;
Lucas, Sebastian B. ;
Allinson, Kieren S. J. ;
Price, Stephen J. ;
Santarius, Thomas S. ;
Carmichael, Andrew J. ;
Chiodini, Peter L. ;
Holroyd, Nancy ;
Dean, Andrew F. ;
Berriman, Matthew .
GENOME BIOLOGY, 2014, 15 (11) :510
[6]   Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome [J].
Bickhart, Derek M. ;
Rosen, Benjamin D. ;
Koren, Sergey ;
Sayre, Brian L. ;
Hastie, Alex R. ;
Chan, Saki ;
Lee, Joyce ;
Lam, Ernest T. ;
Liachko, Ivan ;
Sullivan, Shawn T. ;
Burton, Joshua N. ;
Huson, Heather J. ;
Nystrom, John C. ;
Kelley, Christy M. ;
Hutchison, Jana L. ;
Zhou, Yang ;
Sun, Jiajie ;
Crisa, Alessandra ;
de Leon, F. Abel Ponce ;
Schwartz, John C. ;
Hammond, John A. ;
Waldbieser, Geoffrey C. ;
Schroeder, Steven G. ;
Liu, George E. ;
Dunham, Maitreya J. ;
Shendure, Jay ;
Sonstegard, Tad S. ;
Phillippy, Adam M. ;
Van Tassell, Curtis P. ;
Smith, Timothy P. L. .
NATURE GENETICS, 2017, 49 (04) :643-+
[7]   The Genome of the Netherlands: design, and project goals [J].
Boomsma, Dorret I. ;
Wijmenga, Cisca ;
Slagboom, Eline P. ;
Swertz, Morris A. ;
Karssen, Lennart C. ;
Abdellaoui, Abdel ;
Ye, Kai ;
Guryev, Victor ;
Vermaat, Martijn ;
van Dijk, Freerk ;
Francioli, Laurent C. ;
Hottenga, Jouke Jan ;
Laros, Jeroen F. J. ;
Li, Qibin ;
Li, Yingrui ;
Cao, Hongzhi ;
Chen, Ruoyan ;
Du, Yuanping ;
Li, Ning ;
Cao, Sujie ;
van Setten, Jessica ;
Menelaou, Androniki ;
Pulit, Sara L. ;
Hehir-Kwa, Jayne Y. ;
Beekman, Marian ;
Elbers, Clara C. ;
Byelas, Heorhiy ;
de Craen, Anton J. M. ;
Deelen, Patrick ;
Dijkstra, Martijn ;
den Dunnen, Johan T. ;
de Knijff, Peter ;
Houwing-Duistermaat, Jeanine ;
Koval, Vyacheslav ;
Estrada, Karol ;
Hofman, Albert ;
Kanterakis, Alexandros ;
van Enckevort, David ;
Mai, Hailiang ;
Kattenberg, Mathijs ;
van Leeuwen, Elisabeth M. ;
Neerincx, Pieter B. T. ;
Oostra, Ben ;
Rivadeneira, Fernanodo ;
Suchiman, Eka H. D. ;
Uitterlinden, Andre G. ;
Willemsen, Gonneke ;
Wolffenbuttel, Bruce H. ;
Wang, Jun ;
de Bakker, Paul I. W. .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2014, 22 (02) :221-227
[8]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10
[9]   Resolving the complexity of the human genome using single-molecule sequencing [J].
Chaisson, Mark J. P. ;
Huddleston, John ;
Dennis, Megan Y. ;
Sudmant, Peter H. ;
Malig, Maika ;
Hormozdiari, Fereydoun ;
Antonacci, Francesca ;
Surti, Urvashi ;
Sandstrom, Richard ;
Boitano, Matthew ;
Landolin, Jane M. ;
Stamatoyannopoulos, John A. ;
Hunkapiller, Michael W. ;
Korlach, Jonas ;
Eichler, Evan E. .
NATURE, 2015, 517 (7536) :608-U163
[10]  
Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/NMETH.4035, 10.1038/nmeth.4035]