A Maximum-Likelihood Approach to Estimating the Insertion Frequencies of Transposable Elements from Population Sequencing Data

被引:3
|
作者
Jiang, Xiaoqian [1 ]
Tang, Haixu [2 ]
Ismail, Wazim Mohammed [2 ]
Lynch, Michael [3 ]
机构
[1] Indiana Univ, Dept Biol, Bloomington, IN 47405 USA
[2] Indiana Univ, Sch Informat & Comp, Bloomington, IN 47405 USA
[3] Arizona State Univ, Ctr Mech Evolut, Tempe, AZ USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
transposable elements; insertion polymorphism; purifying selection; maximum-likelihood; population genomics; DROSOPHILA-MELANOGASTER; GENETIC ELEMENTS; GENOME; POLYMORPHISM; RESISTANCE; PATTERNS; DELETION; REGIONS; RATES;
D O I
10.1093/molbev/msy152
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Transposable elements (TEs) contribute to a large fraction of the expansion of many eukaryotic genomes due to the capability of TEs duplicating themselves through transposition. A first step to understanding the roles of TEs in a eukaryotic genome is to characterize the population-wide variation of TE insertions in the species. Here, we present a maximum-likelihood (ML) method for estimating allele frequencies and detecting selection on TE insertions in a diploid population, based on the genotypes at TE insertion sites detected in multiple individuals sampled from the population using paired-end (PE) sequencing reads. Tests of the method on simulated data show that it can accurately estimate the allele frequencies of TE insertions even when the PE sequencing is conducted at a relatively low coverage (= 5X). The method can also detect TE insertions under strong selection, and the detection ability increases with sample size in a population, although a substantial fraction of actual TE insertions under selection may be undetected. Application of the ML method to genomic sequencing data collected from a natural Daphnia pulex population shows that, on the one hand, most (> 90%) TE insertions present in the reference D. pulex genome are either fixed or nearly fixed (with allele frequencies > 0.95); on the other hand, among the nonreference TE insertions (i.e., those detected in some individuals in the population but absent from the reference genome), the majority (>70%) are still at low frequencies (< 0.1). Finally, we detected a substantial fraction (similar to 9%) of nonreference TE insertions under selection.
引用
收藏
页码:2560 / 2571
页数:12
相关论文
共 14 条