CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data

被引:1
作者
Soylev, Arda [1 ,2 ]
Cokoglu, Sevim Seda [3 ]
Koptekin, Dilek [4 ]
Alkan, Can [5 ]
Somel, Mehmet [3 ]
机构
[1] Konya Food & Agr Univ, Dept Comp Engn, Konya, Turkey
[2] Heinrich Heine Univ, Med Fac, Inst Med Biometry & Bioinformat, Dusseldorf, Germany
[3] Middle East Tech Univ, Dept Biol, Ankara, Turkey
[4] Middle East Tech Univ, Grad Sch Informat, Dept Hlth Informat, Ankara, Turkey
[5] Bilkent Univ, Dept Comp Engn, Ankara, Turkey
基金
欧洲研究理事会;
关键词
STRUCTURAL VARIATION; ADAPTIVE EVOLUTION; EARLY FARMERS; ADMIXTURE; DNA; DISCOVERY; HISTORY; POLYMORPHISM; FRAMEWORK; DELETION;
D O I
10.1371/journal.pcbi.1010788
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
To date, ancient genome analyses have been largely confined to the study of single nucleotide polymorphisms (SNPs). Copy number variants (CNVs) are a major contributor of disease and of evolutionary adaptation, but identifying CNVs in ancient shotgun-sequenced genomes is hampered by typical low genome coverage (<1x) and short fragments (<80 bps), precluding standard CNV detection software to be effectively applied to ancient genomes. Here we present CONGA, tailored for genotyping CNVs at low coverage. Simulations and down-sampling experiments suggest that CONGA can genotype deletions >1 kbps with F-scores >0.75 at >= 1x, and distinguish between heterozygous and homozygous states. We used CONGA to genotype 10,002 outgroup-ascertained deletions across a heterogenous set of 71 ancient human genomes spanning the last 50,000 years, produced using variable experimental protocols. A fraction of these (21/71) display divergent deletion profiles unrelated to their population origin, but attributable to technical factors such as coverage and read length. The majority of the sample (50/71), despite originating from nine different laboratories and having coverages ranging from 0.44x-26x (median 4x) and average read lengths 52-121 bps (median 69), exhibit coherent deletion frequencies. Across these 50 genomes, inter-individual genetic diversity measured using SNPs and CONGA-genotyped deletions are highly correlated. CONGA-genotyped deletions also display purifying selection signatures, as expected. CONGA thus paves the way for systematic CNV analyses in ancient genomes, despite the technical challenges posed by low and variable genome coverage. Author summary In parallel with developments in genomic technologies over the last decades, ancient genomics opened a new era in understanding the evolutionary history of populations and species. However, the field still needs novel computational methods for accurate and effective use of ancient genome data, which is mostly low-coverage and more challenging to analyse than modern-day genomes. Single nucleotide polymorphisms (SNPs), to date, have yet been the main source of information analysed in ancient genome studies. This is despite copy number variants (CNVs) harboring at least as much information as SNPs, especially with respect to natural selection. Here we developed CONGA, an algorithm for genotyping deletions and duplications in low-coverage genomes. We assessed its accuracy using simulations (with ancient-like data), and also studied its performance among 71 real ancient human genomes from different laboratories. We found that the common practice of authors filtering their ancient genome data before publishing prevents the reliable identification of duplications. Meanwhile, large (>1,000 base-pair) deletions can be detected even at quite low coverage (e.g. 0.5x). Deletions called in ancient genomes reflect population history and also show signs of negative selection.
引用
收藏
页数:32
相关论文
共 126 条
  • [61] The uniqueome: a mappability resource for short-tag sequencing
    Koehler, Ryan
    Issac, Hadar
    Cloonan, Nicole
    Grimmond, Sean M.
    [J]. BIOINFORMATICS, 2011, 27 (02) : 272 - 274
  • [62] Positive Selection on a Regulatory Insertion-Deletion Polymorphism in FADS2 Influences Apparent Endogenous Synthesis of Arachidonic Acid
    Kothapalli, Kumar S. D.
    Ye, Kaixiong
    Gadgil, Maithili S.
    Carlson, Susan E.
    O'Brien, Kimberly O.
    Zhang, Ji Yao
    Park, Hui Gyu
    Ojukwu, Kinsley
    Zou, James
    Hyon, Stephanie S.
    Joshi, Kalpana S.
    Gu, Zhenglong
    Keinan, Alon
    Brenna, J. Thomas
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2016, 33 (07) : 1726 - 1739
  • [63] Inferring Heterozygosity from Ancient and Low Coverage Genomes
    Kousathanas, Athanasios
    Leuenberger, Christoph
    Link, Vivian
    Sell, Christian
    Burger, Joachim
    Wegmann, Daniel
    [J]. GENETICS, 2017, 205 (01) : 317 - +
  • [64] Ancient genomes suggest the eastern Pontic-Caspian steppe as the source of western Iron Age nomads
    Krzewinska, Maja
    Kilinc, Gulsah Merve
    Juras, Anna
    Koptekin, Dilek
    Chylenski, Maciej
    Nikitin, Alexey G.
    Shcherbakov, Nikolai
    Shuteleva, Iia
    Leonova, Tatiana
    Kraeva, Liudmila
    Sungatov, Flarit A.
    Sultanova, Alfija N.
    Potekhina, Inna
    Lukasik, Sylwia
    Krenz-Niedbala, Marta
    Dalen, Love
    Sinika, Vitaly
    Jakobsson, Mattias
    Stora, Jan
    Gotherstrom, Anders
    [J]. SCIENCE ADVANCES, 2018, 4 (10):
  • [65] LUMPY: a probabilistic framework for structural variant discovery
    Layer, Ryan M.
    Chiang, Colby
    Quinlan, Aaron R.
    Hall, Ira M.
    [J]. GENOME BIOLOGY, 2014, 15 (06):
  • [66] Ancient human genomes suggest three ancestral populations for present-day Europeans
    Lazaridis, Iosif
    Patterson, Nick
    Mittnik, Alissa
    Renaud, Gabriel
    Mallick, Swapan
    Kirsanow, Karola
    Sudmant, Peter H.
    Schraiber, Joshua G.
    Castellano, Sergi
    Lipson, Mark
    Berger, Bonnie
    Economou, Christos
    Bollongino, Ruth
    Fu, Qiaomei
    Bos, Kirsten I.
    Nordenfelt, Susanne
    Li, Heng
    de Filippo, Cesare
    Pruefer, Kay
    Sawyer, Susanna
    Posth, Cosimo
    Haak, Wolfgang
    Hallgren, Fredrik
    Fornander, Elin
    Rohland, Nadin
    Delsate, Dominique
    Francken, Michael
    Guinet, Jean-Michel
    Wahl, Joachim
    Ayodo, George
    Babiker, Hamza A.
    Bailliet, Graciela
    Balanovska, Elena
    Balanovsky, Oleg
    Barrantes, Ramiro
    Bedoya, Gabriel
    Ben-Ami, Haim
    Bene, Judit
    Berrada, Fouad
    Bravi, Claudio M.
    Brisighelli, Francesca
    Busby, George B. J.
    Cali, Francesco
    Churnosov, Mikhail
    Cole, David E. C.
    Corach, Daniel
    Damba, Larissa
    van Driem, George
    Dryomov, Stanislav
    Dugoujon, Jean-Michel
    [J]. NATURE, 2014, 513 (7518) : 409 - +
  • [67] Genome maps across 26 human populations reveal population-specific patterns of structural variation
    Levy-Sakin, Michal
    Pastor, Steven
    Mostovoy, Yulia
    Li, Le
    Leung, Alden K. Y.
    McCaffrey, Jennifer
    Young, Eleanor
    Lam, Ernest T.
    Hastie, Alex R.
    Wong, Karen H. Y.
    Chung, Claire Y. L.
    Ma, Walfred
    Sibert, Justin
    Rajagopalan, Ramakrishnan
    Jin, Nana
    Chow, Eugene Y. C.
    Chu, Catherine
    Poon, Annie
    Lin, Chin
    Naguib, Ahmed
    Wang, Wei-Ping
    Cao, Han
    Chan, Ting-Fung
    Yip, Kevin Y.
    Xiao, Ming
    Kwok, Pui-Yan
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)
  • [68] The Sequence Alignment/Map format and SAMtools
    Li, Heng
    Handsaker, Bob
    Wysoker, Alec
    Fennell, Tim
    Ruan, Jue
    Homer, Nils
    Marth, Gabor
    Abecasis, Goncalo
    Durbin, Richard
    [J]. BIOINFORMATICS, 2009, 25 (16) : 2078 - 2079
  • [69] Li H, 2009, BIOINFORMATICS, V25, P1094, DOI [10.1093/bioinformatics/btp100, 10.1093/bioinformatics/btp324]
  • [70] Link V., 2017, BioRxiv, DOI DOI 10.1101/105346