SweeD: Likelihood-Based Detection of Selective Sweeps in Thousands of Genomes

被引:333
作者
Pavlidis, Pavlos [1 ]
Zivkovic, Daniel [2 ]
Stamatakis, Alexandros [1 ]
Alachiotis, Nikolaos [1 ]
机构
[1] Heidelberg Inst Theoret Studies HITS gGmbH, Exelixis Lab, Sci Comp Grp, Heidelberg, Germany
[2] Univ Munich, Sect Evolutionary Biol, Bioctr, Planegg Martinsried, Germany
关键词
selective sweep; positive selection; high-performance computing; site frequency spectrum; POSITIVE SELECTION; LINKAGE DISEQUILIBRIUM; POPULATION; SEQUENCE; HITCHHIKING; SIMULATION; SIGNATURE; SITES;
D O I
10.1093/molbev/mst112
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The advent of modern DNA sequencing technology is the driving force in obtaining complete intra-specific genomes that can be used to detect loci that have been subject to positive selection in the recent past. Based on selective sweep theory, beneficial loci can be detected by examining the single nucleotide polymorphism patterns in intraspecific genome alignments. In the last decade, a plethora of algorithms for identifying selective sweeps have been developed. However, the majority of these algorithms have not been designed for analyzing whole-genome data. We present SweeD (Sweep Detector), an open-source tool for the rapid detection of selective sweeps in whole genomes. It analyzes site frequency spectra and represents a substantial extension of the widely used SweepFinder program. The sequential version of SweeD is up to 22 times faster than SweepFinder and, more importantly, is able to analyze thousands of sequences. We also provide a parallel implementation of SweeD for multi-core processors. Furthermore, we implemented a checkpointing mechanism that allows to deploy SweeD on cluster systems with queue execution time restrictions, as well as to resume long-running analyses after processor failures. In addition, the user can specify various demographic models via the command-line to calculate their theoretically expected site frequency spectra. Therefore, (in contrast to SweepFinder) the neutral site frequencies can optionally be directly calculated from a given demographic model. We show that an increase of sample size results in more precise detection of positive selection. Thus, the ability to analyze substantially larger sample sizes by using SweeD leads to more accurate sweep detection. We validate SweeD via simulations and by scanning the first chromosome from the 1000 human Genomes project for selective sweeps. We compare SweeD results with results from a linkage-disequilibrium-based approach and identify common outliers.
引用
收藏
页码:2224 / 2234
页数:11
相关论文
共 26 条
[1]   OmegaPlus: a scalable tool for rapid detection of selective sweeps in whole-genome datasets [J].
Alachiotis, N. ;
Stamatakis, A. ;
Pavlidis, P. .
BIOINFORMATICS, 2012, 28 (17) :2274-2275
[2]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[3]  
Ansel J, 2009, INT PARALL DISTRIB P, P895
[4]   Fast and flexible simulation of DNA sequence data [J].
Chen, Gary K. ;
Marjoram, Paul ;
Wall, Jeffrey D. .
GENOME RESEARCH, 2009, 19 (01) :136-142
[5]   Non-equilibrium theory of the allele frequency spectrum [J].
Evans, Steven N. ;
Shvets, Yelena ;
Slatkin, Montgomery .
THEORETICAL POPULATION BIOLOGY, 2007, 71 (01) :109-119
[6]   MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus [J].
Ewing, Gregory ;
Hermisson, Joachim .
BIOINFORMATICS, 2010, 26 (16) :2064-2065
[7]  
Fay JC, 2000, GENETICS, V155, P1405
[8]  
Fletcher R., 2013, Practical Methods of Optimization, DOI [10.1002/9781118723203, DOI 10.1002/9781118723203]
[9]   MPFR: A multiple-precision binary floating-point library with correct rounding [J].
Fousse, Laurent ;
Hanrot, Guillaume ;
Leflvre, Vincent ;
Plissier, Patrick ;
Zimmermann, Paul .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2007, 33 (02)
[10]   The frequency spectrum of a mutation, and its age, in a general diffusion model [J].
Griffiths, RC .
THEORETICAL POPULATION BIOLOGY, 2003, 64 (02) :241-251