A comprehensive framework for detecting copy number variants from single nucleotide polymorphism data: 'rCNV', a versatile r package for paralogue and CNV detection

被引:4
作者
Karunarathne, Piyal [1 ,2 ,3 ,5 ]
Zhou, Qiujie [1 ,2 ]
Schliep, Klaus [4 ]
Milesi, Pascal [1 ,2 ,5 ]
机构
[1] Uppsala Univ, Dept Ecol & Genet, Plant Ecol & Evolut, Uppsala, Sweden
[2] Sci Life Lab SciLifeLab, Uppsala, Sweden
[3] Heinrich Heine Univ, Inst Populat Genet, Dusseldorf, Germany
[4] Graz Univ Technol, Inst Comp Biotechnol, Graz, Austria
[5] Uppsala Univ, Dept Ecol & Genet, Plant Ecol & Evolut, Norbyvagen 18D, S-75236 Uppsala, Sweden
关键词
CNVs; GBS; paralogues; R statistics; SNPs; GENE DUPLICATION; SEQUENCING TECHNOLOGIES; GENOMICS; CAPTURE; PROPORTION; DIVERSITY; EVOLUTION; PATTERNS; EXOME; TOOL;
D O I
10.1111/1755-0998.13843
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Recent studies have highlighted the significant role of copy number variants (CNVs) in phenotypic diversity, environmental adaptation and species divergence across eukaryotes. The presence of CNVs also has the potential to introduce genotyping biases, which can pose challenges to accurate population and quantitative genetic analyses. However, detecting CNVs in genomes, particularly in non-model organisms, presents a formidable challenge. To address this issue, we have developed a statistical framework and an accompanying r software package that leverage allelic-read depth from single nucleotide polymorphism (SNP) data for accurate CNV detection. Our framework capitalises on two key principles. First, it exploits the distribution of allelic-read depth ratios in heterozygotes for individual SNPs by comparing it against an expected distribution based on binomial sampling. Second, it identifies SNPs exhibiting an apparent excess of heterozygotes under Hardy-Weinberg equilibrium. By employing multiple statistical tests, our method not only enhances sensitivity to sampling effects but also effectively addresses reference biases, resulting in optimised SNP classification. Our framework is compatible with various NGS technologies (e.g. RADseq, Exome-capture). This versatility enables CNV calling from genomes of diverse complexities. To streamline the analysis process, we have implemented our framework in the user-friendly r package 'rCNV', which automates the entire workflow seamlessly. We trained our models using simulated data and validated their performance on four datasets derived from different sequencing technologies, including RADseq (Chinook salmon-Oncorhynchus tshawytscha), Rapture (American lobster-Homarus americanus), Exome-capture (Norway spruce-Picea abies) and WGS (Malaria mosquito-Anopheles gambiae).
引用
收藏
页码:1772 / 1789
页数:18
相关论文
共 62 条
[1]   RAD Capture (Rapture): Flexible and Efficient Sequence-Based Genotyping [J].
Ali, Omar A. ;
O'Rourke, Sean M. ;
Amish, Stephen J. ;
Meek, Mariah H. ;
Luikart, Gordon ;
Jeffres, Carson ;
Miller, Michael R. .
GENETICS, 2016, 202 (02) :389-+
[2]   The ace-1 Locus Is Amplified in All Resistant Anopheles gambiae Mosquitoes: Fitness Consequences of Homogeneous and Heterogeneous Duplications [J].
Assogba, Benoit S. ;
Milesi, Pascal ;
Djogbenou, Luc S. ;
Berthomieu, Arnaud ;
Makoundou, Patrick ;
Baba-Moussa, Lamine S. ;
Fiston-Lavier, Anna-Sophie ;
Belkhir, Khalid ;
Labbe, Pierrick ;
Weill, Mylene .
PLOS BIOLOGY, 2016, 14 (12)
[3]   Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage [J].
Barbitoff, Yury A. ;
Polev, Dmitrii E. ;
Glotov, Andrey S. ;
Serebryakova, Elena A. ;
Shcherbakova, Irina V. ;
Kiselev, Artem M. ;
Kostareva, Anna A. ;
Glotov, Oleg S. ;
Predeus, Alexander V. .
SCIENTIFIC REPORTS, 2020, 10 (01)
[4]  
Borges MG, 2020, GENET MOL BIOL, V43, DOI [10.1590/1678-4685-GMB-2019-0270, 10.1590/1678-4685-gmb-2019-0270]
[5]   The genomic pool of standing structural variation outnumbers single nucleotide polymorphism by threefold in the marine teleost Chrysophrys auratus [J].
Catanach, Andrew ;
Crowhurst, Ross ;
Deng, Cecilia ;
David, Charles ;
Bernatchez, Louis ;
Wellenreuther, Maren .
MOLECULAR ECOLOGY, 2019, 28 (06) :1210-1223
[6]   Stacks: an analysis tool set for population genomics [J].
Catchen, Julian ;
Hohenlohe, Paul A. ;
Bassham, Susan ;
Amores, Angel ;
Cresko, William A. .
MOLECULAR ECOLOGY, 2013, 22 (11) :3124-3140
[7]   Genomic signatures of thermal adaptation are associated with clinal shifts of life history in a broadly distributed frog [J].
Cayuela, Hugo ;
Dorant, Yann ;
Forester, Brenna R. ;
Jeffries, Dan L. ;
Mccaffery, Rebecca M. ;
Eby, Lisa A. ;
Hossack, Blake R. ;
Gippet, Jerome M. W. ;
Pilliod, David S. ;
Chris Funk, W. .
JOURNAL OF ANIMAL ECOLOGY, 2022, 91 (06) :1222-1238
[8]   Ecological and evolutionary implications of genomic structural variations [J].
Chain, Frederic J. J. ;
Feulner, Philine G. D. .
FRONTIERS IN GENETICS, 2014, 5
[9]   Genomic data provide new insights on the demographic history and the extent of recent material transfers in Norway spruce [J].
Chen, Jun ;
Li, Lili ;
Milesi, Pascal ;
Jansson, Gunnar ;
Berlin, Mats ;
Karlsson, Bo ;
Aleksic, Jelena ;
Vendramin, Giovanni G. ;
Lascoux, Martin .
EVOLUTIONARY APPLICATIONS, 2019, 12 (08) :1539-1551
[10]  
Chong ZC, 2017, NAT METHODS, V14, P65, DOI [10.1038/NMETH.4084, 10.1038/nmeth.4084]