Discovery of Novel Sequences in 1,000 Swedish Genomes

被引:19
作者
Eisfeldt, Jesper [1 ,2 ,3 ]
Martensson, Gustaf [4 ]
Ameur, Adam [5 ]
Nilsson, Daniel [1 ,2 ,3 ]
Lindstrand, Anna [1 ,3 ]
机构
[1] Karolinska Inst, Ctr Mol Med, Dept Mol Med & Surg, Stockholm, Sweden
[2] Karolinska Inst, Sci Life Lab, Sci Pk, Solna, Sweden
[3] Karolinska Univ Hosp, Dept Clin Genet, Stockholm, Sweden
[4] KTH Royal Inst Technol, Sch Engn Sci Chem Biotechnol & Hlth, Sci Life Lab, Div Nanobiotechnol,Dept Prot Sci, Stockholm, Sweden
[5] Uppsala Univ, Sci Life Lab, Dept Immunol Genet & Pathol, Uppsala, Sweden
基金
瑞典研究理事会;
关键词
population genomics; novel sequences; de novo assembly; ancestral deletion; GENETIC-VARIATION; PROTEIN; DIVERSITY; ALIGNMENT; RESOURCE;
D O I
10.1093/molbev/msz176
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Novel sequences (NSs), not present in the human reference genome, are abundant and remain largely unexplored. Here, we utilize de novo assembly to study NS in 1,000 Swedish individuals first sequenced as part of the SweGen project revealing a total of 46 Mb in 61,044 distinct contigs of sequences not present in GRCh38. The contigs were aligned to recently published catalogs of Icelandic and Pan-African NSs, as well as the chimpanzee genome, revealing a great diversity of shared sequences. Analyzing the positioning of NS across the chimpanzee genome, we find that 2,807 NS align confidently within 143 chimpanzee orthologs of human genes. Aligning the whole genome sequencing data to the chimpanzee genome, we discover ancestral NS common throughout the Swedish population. The NSs were searched for repeats and repeat elements: revealing a majority of repetitive sequence (56%), and enrichment of simple repeats (28%) and satellites (15%). Lastly, we align the unmappable reads of a subset of the thousand genomes data to our collection of NS, as well as the previously published Pan-African NS: revealing that both the Swedish and Pan-African NS are widespread, and that the Swedish NSs are largely a subset of the Pan-African NS. Overall, these results highlight the importance of creating a more diverse reference genome and illustrate that significant amounts of the NS may be of ancestral origin.
引用
收藏
页码:18 / 30
页数:13
相关论文
共 50 条
  • [1] Investigation of mutations in the HBB gene using the 1,000 genomes database
    Carlice-dos-Reis, Tania
    Viana, Jaime
    Moreira, Fabiano Cordeiro
    Cardoso, Greice de Lemos
    Guerreiro, Joao
    Santos, Sidney
    Ribeiro-dos-Santos, Andrea
    PLOS ONE, 2017, 12 (04):
  • [2] Subgroup-specific structural variation across 1,000 medulloblastoma genomes
    Northcott, Paul A.
    Shih, David J. H.
    Peacock, John
    Garzia, Livia
    Morrissy, A. Sorana
    Zichner, Thomas
    Stuetz, Adrian M.
    Korshunov, Andrey
    Reimand, Jueri
    Schumacher, Steven E.
    Beroukhim, Rameen
    Ellison, David W.
    Marshall, Christian R.
    Lionel, Anath C.
    Mack, Stephen
    Dubuc, Adrian
    Yao, Yuan
    Ramaswamy, Vijay
    Luu, Betty
    Rolider, Adi
    Cavalli, Florence M. G.
    Wang, Xin
    Remke, Marc
    Wu, Xiaochong
    Chiu, Readman Y. B.
    Chu, Andy
    Chuah, Eric
    Corbett, Richard D.
    Hoad, Gemma R.
    Jackman, Shaun D.
    Li, Yisu
    Lo, Allan
    Mungall, Karen L.
    Nip, Ka Ming
    Qian, Jenny Q.
    Raymond, Anthony G. J.
    Thiessen, Nina
    Varhol, Richard J.
    Birol, Inanc
    Moore, Richard A.
    Mungall, Andrew J.
    Holt, Robert
    Kawauchi, Daisuke
    Roussel, Martine F.
    Kool, Marcel
    Jones, David T. W.
    Witt, Hendrick
    Fernandez-L, Africa
    Kenney, Anna M.
    Wechsler-Reya, Robert J.
    NATURE, 2012, 488 (7409) : 49 - 56
  • [3] Evidence for the biogenesis of more than 1,000 novel human microRNAs
    Friedlaender, Marc R.
    Lizano, Esther
    Houben, Anna J. S.
    Bezdan, Daniela
    Banez-Coronel, Monica
    Kudla, Grzegorz
    Mateu-Huertas, Elisabet
    Kagerbauer, Birgit
    Gonzalez, Justo
    Chen, Kevin C.
    LeProust, Emily M.
    Marti, Eulalia
    Estivill, Xavier
    GENOME BIOLOGY, 2014, 15 (04):
  • [4] 1,000 structures and more from the MCSG
    Lee, David
    de Beer, Tjaart A. P.
    Laskowski, Roman A.
    Thornton, Janet M.
    Orengo, Christine A.
    BMC STRUCTURAL BIOLOGY, 2011, 11
  • [5] The discovery of novel noncoding RNAs in 50 bacterial genomes
    Narunsky, Aya
    Higgs, Gadareth A.
    Torres, Blake M.
    Yu, Diane
    de Andrade, Gabriel Belem
    Kavita, Kumari
    Breaker, Ronald R.
    NUCLEIC ACIDS RESEARCH, 2024, 52 (09) : 5152 - 5165
  • [6] Palidis: fast discovery of novel insertion sequences
    Carr, Victoria R.
    Pissis, Solon P.
    Mullany, Peter
    Shoaie, Saeed
    Gomez-Cabrero, David
    Moyes, David L.
    MICROBIAL GENOMICS, 2023, 9 (03):
  • [7] Discovery of Known and Novel Viral Genomes in Soybean Aphid by Deep Sequencing
    Feng, Ying
    Krueger, Elizabeth N.
    Liu, Sijun
    Dorman, Karin
    Bonning, Bryony C.
    Miller, W. Allen
    PHYTOBIOMES JOURNAL, 2017, 1 (01): : 36 - 45
  • [8] Discovery and Evolution of Bunyavirids in Arctic Phantom Midges and Ancient Bunyavirid-Like Sequences in Insect Genomes
    Ballinger, Matthew J.
    Bruenn, Jeremy A.
    Hay, John
    Czechowski, Donna
    Taylor, Derek J.
    JOURNAL OF VIROLOGY, 2014, 88 (16) : 8783 - 8794
  • [9] Discovery and description of novel phage genomes from urban microbiomes sampled by the MetaSUB consortium
    Flores, Vinicius S.
    Amgarten, Deyvid E.
    Iha, Bruno Koshin Vazquez
    Ryon, Krista A.
    Danko, David
    Tierney, Braden T.
    Mason, Christopher
    da Silva, Aline Maria
    Setubal, Joao Carlos
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [10] Discovery of Novel Plant Interaction Determinants from the Genomes of 163 Root Nodule Bacteria
    Seshadri, Rekha
    Reeve, Wayne G.
    Ardley, Julie K.
    Tennessen, Kristin
    Woyke, Tanja
    Kyrpides, Nikos C.
    Ivanova, Natalia N.
    SCIENTIFIC REPORTS, 2015, 5