CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data

被引:1
作者
Soylev, Arda [1 ,2 ]
Cokoglu, Sevim Seda [3 ]
Koptekin, Dilek [4 ]
Alkan, Can [5 ]
Somel, Mehmet [3 ]
机构
[1] Konya Food & Agr Univ, Dept Comp Engn, Konya, Turkey
[2] Heinrich Heine Univ, Med Fac, Inst Med Biometry & Bioinformat, Dusseldorf, Germany
[3] Middle East Tech Univ, Dept Biol, Ankara, Turkey
[4] Middle East Tech Univ, Grad Sch Informat, Dept Hlth Informat, Ankara, Turkey
[5] Bilkent Univ, Dept Comp Engn, Ankara, Turkey
基金
欧洲研究理事会;
关键词
STRUCTURAL VARIATION; ADAPTIVE EVOLUTION; EARLY FARMERS; ADMIXTURE; DNA; DISCOVERY; HISTORY; POLYMORPHISM; FRAMEWORK; DELETION;
D O I
10.1371/journal.pcbi.1010788
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
To date, ancient genome analyses have been largely confined to the study of single nucleotide polymorphisms (SNPs). Copy number variants (CNVs) are a major contributor of disease and of evolutionary adaptation, but identifying CNVs in ancient shotgun-sequenced genomes is hampered by typical low genome coverage (<1x) and short fragments (<80 bps), precluding standard CNV detection software to be effectively applied to ancient genomes. Here we present CONGA, tailored for genotyping CNVs at low coverage. Simulations and down-sampling experiments suggest that CONGA can genotype deletions >1 kbps with F-scores >0.75 at >= 1x, and distinguish between heterozygous and homozygous states. We used CONGA to genotype 10,002 outgroup-ascertained deletions across a heterogenous set of 71 ancient human genomes spanning the last 50,000 years, produced using variable experimental protocols. A fraction of these (21/71) display divergent deletion profiles unrelated to their population origin, but attributable to technical factors such as coverage and read length. The majority of the sample (50/71), despite originating from nine different laboratories and having coverages ranging from 0.44x-26x (median 4x) and average read lengths 52-121 bps (median 69), exhibit coherent deletion frequencies. Across these 50 genomes, inter-individual genetic diversity measured using SNPs and CONGA-genotyped deletions are highly correlated. CONGA-genotyped deletions also display purifying selection signatures, as expected. CONGA thus paves the way for systematic CNV analyses in ancient genomes, despite the technical challenges posed by low and variable genome coverage. Author summary In parallel with developments in genomic technologies over the last decades, ancient genomics opened a new era in understanding the evolutionary history of populations and species. However, the field still needs novel computational methods for accurate and effective use of ancient genome data, which is mostly low-coverage and more challenging to analyse than modern-day genomes. Single nucleotide polymorphisms (SNPs), to date, have yet been the main source of information analysed in ancient genome studies. This is despite copy number variants (CNVs) harboring at least as much information as SNPs, especially with respect to natural selection. Here we developed CONGA, an algorithm for genotyping deletions and duplications in low-coverage genomes. We assessed its accuracy using simulations (with ancient-like data), and also studied its performance among 71 real ancient human genomes from different laboratories. We found that the common practice of authors filtering their ancient genome data before publishing prevents the reliable identification of duplications. Meanwhile, large (>1,000 base-pair) deletions can be detected even at quite low coverage (e.g. 0.5x). Deletions called in ancient genomes reflect population history and also show signs of negative selection.
引用
收藏
页数:32
相关论文
共 126 条
  • [1] CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing
    Abyzov, Alexej
    Urban, Alexander E.
    Snyder, Michael
    Gerstein, Mark
    [J]. GENOME RESEARCH, 2011, 21 (06) : 974 - 984
  • [2] Automatic characterization of copy number polymorphism using high throughput sequencing
    Alkan, Can
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2020, 28 (01) : 253 - 261
  • [3] APPLICATIONS OF NEXT-GENERATION SEQUENCING Genome structural variation discovery and genotyping
    Alkan, Can
    Coe, Bradley P.
    Eichler, Evan E.
    [J]. NATURE REVIEWS GENETICS, 2011, 12 (05) : 363 - 375
  • [4] Personalized copy number and segmental duplication maps using next-generation sequencing
    Alkan, Can
    Kidd, Jeffrey M.
    Marques-Bonet, Tomas
    Aksay, Gozde
    Antonacci, Francesca
    Hormozdiari, Fereydoun
    Kitzman, Jacob O.
    Baker, Carl
    Malig, Maika
    Mutlu, Onur
    Sahinalp, S. Cenk
    Gibbs, Richard A.
    Eichler, Evan E.
    [J]. NATURE GENETICS, 2009, 41 (10) : 1061 - U29
  • [5] Population genomics of Bronze Age Eurasia
    Allentoft, Morten E.
    Sikora, Martin
    Sjogren, Karl-Goran
    Rasmussen, Simon
    Rasmussen, Morten
    Stenderup, Jesper
    Damgaard, Peter B.
    Schroeder, Hannes
    Ahlstrom, Torbjorn
    Vinner, Lasse
    Malaspinas, Anna-Sapfo
    Margaryan, Ashot
    Higham, Tom
    Chivall, David
    Lynnerup, Niels
    Harvig, Lise
    Baron, Justyna
    Della Casa, Philippe
    Dabrowski, Pawel
    Duffy, Paul R.
    Ebel, Alexander V.
    Epimakhov, Andrey
    Frei, Karin
    Furmanek, Miroslaw
    Gralak, Tomasz
    Gromov, Andrey
    Gronkiewicz, Stanislaw
    Grupe, Gisela
    Hajdu, Tamas
    Jarysz, Radoslaw
    Khartanovich, Valeri
    Khokhlov, Alexandr
    Kiss, Viktoria
    Kolar, Jan
    Kriiska, Aivar
    Lasak, Irena
    Longhi, Cristina
    McGlynn, George
    Merkevicius, Algimantas
    Merkyte, Inga
    Metspalu, Mait
    Mkrtchyan, Ruzan
    Moiseyev, Vyacheslav
    Paja, Laszlo
    Palfi, Gyoergy
    Pokutta, Dalia
    Pospieszny, Lukasz
    Price, T. Douglas
    Saag, Lehti
    Sablin, Mikhail
    [J]. NATURE, 2015, 522 (7555) : 167 - +
  • [6] Population Structure, Stratification, and Introgression of Human Structural Variation
    Almarri, Mohamed A.
    Bergstrom, Anders
    Prado-Martinez, Javier
    Yang, Fengtang
    Fu, Beiyuan
    Dunham, Alistair S.
    Chen, Yuan
    Hurles, Matthew E.
    Tyler-Smith, Chris
    Xue, Yali
    [J]. CELL, 2020, 182 (01) : 189 - +
  • [7] A global reference for human genetic variation
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Wang, Jun
    Wilson, Richard K.
    Boerwinkle, Eric
    Doddapaneni, Harsha
    Han, Yi
    Korchina, Viktoriya
    Kovar, Christie
    Lee, Sandra
    Muzny, Donna
    Reid, Jeffrey G.
    Zhu, Yiming
    Chang, Yuqi
    Feng, Qiang
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Lan, Tianming
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Liu, Shengmao
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Tang, Meifang
    Wang, Bo
    [J]. NATURE, 2015, 526 (7571) : 68 - +
  • [8] [Anonymous], 2019, PIC TOOLK
  • [9] Ancient Rome: Agenetic crossroads of Europe and the Mediterranean
    Antonio, Margaret L.
    Gao, Ziyue
    Moots, Hannah M.
    Lucci, Michaela
    Candilio, Francesca
    Sawyer, Susanna
    Oberreiter, Victoria
    Calderon, Diego
    Devitofranceschi, Katharina
    Aikens, Rachael C.
    Aneli, Serena
    Bartoli, Fulvio
    Bedini, Alessandro
    Cheronet, Olivia
    Cotter, Daniel J.
    Fernandes, Daniel M.
    Gasperetti, Gabriella
    Grifoni, Renata
    Guidi, Alessandro
    La Pastina, Francesco
    Loreti, Ersilia
    Manacorda, Daniele
    Matullo, Giuseppe
    Morretta, Simona
    Nava, Alessia
    Nicolai, Vincenzo Fiocchi
    Nomi, Federico
    Pavolini, Carlo
    Pentiricci, Massimo
    Pergola, Philippe
    Piranomonte, Marina
    Schmidt, Ryan
    Spinola, Giandomenico
    Sperduti, Alessandra
    Rubini, Mauro
    Bondioli, Luca
    Coppa, Alfredo
    Pinhasi, Ron
    Pritchard, Jonathan K.
    [J]. SCIENCE, 2019, 366 (6466) : 708 - +
  • [10] Characterizing the Major Structural Variant Alleles of the Human Genome
    Audano, Peter A.
    Sulovari, Arvis
    Graves-Lindsay, Tina A.
    Cantsilieris, Stuart
    Sorensen, Melanie
    Welch, AnneMarie E.
    Dougherty, Max L.
    Nelson, Bradley J.
    Shah, Ankeeta
    Dutcher, Susan K.
    Warren, Wesley C.
    Magrini, Vincent
    McGrath, Sean D.
    Li, Yang I.
    Wilson, Richard K.
    Eichler, Evan E.
    [J]. CELL, 2019, 176 (03) : 663 - +