ERVcaller: identifying polymorphic endogenous retrovirus and other transposable element insertions using whole-genome sequencing data

被引:19
|
作者
Chen, Xun [1 ]
Li, Dawei [1 ,2 ,3 ]
机构
[1] Univ Vermont, Dept Microbiol & Mol Genet, Burlington, VT 05405 USA
[2] Univ Vermont, Neurosci Behav & Hlth Initiat, Burlington, VT 05405 USA
[3] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
关键词
STRUCTURAL VARIATION; DISCOVERY; EVOLUTION; REVEALS; FORMAT;
D O I
10.1093/bioinformatics/btz205
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Approximately 8% of the human genome is derived from endogenous retroviruses (ERVs). In recent years, an increasing number of human diseases have been found to be associated with ERVs. However, it remains challenging to accurately detect the full spectrum of polymorphic (unfixed) ERVs using whole-genome sequencing (WGS) data. Results: We designed a new tool, ERVcaller, to detect and genotype transposable element (TE) insertions, including ERVs, in the human genome. We evaluated ERVcaller using both simulated and real benchmark WGS datasets. Compared to existing tools, ERVcaller consistently obtained both the highest sensitivity and precision for detecting simulated ERV and other TE insertions derived from real polymorphic TE sequences. For the WGS data from the 1000 Genomes Project, ERVcaller detected the largest number of TE insertions per sample based on consensus TE loci. By analyzing the experimentally verified TE insertions, ERVcaller had 94.0% TE detection sensitivity and 96.6% genotyping accuracy. Polymerase chain reaction and Sanger sequencing in a small sample set verified 86.7% of examined insertion statuses and 100% of examined genotypes. In conclusion, ERVcaller is capable of detecting and genotyping TE insertions using WGS data with both high sensitivity and precision. This tool can be applied broadly to other species.
引用
收藏
页码:3913 / 3922
页数:10
相关论文
共 50 条
  • [31] Prediction of various blood group systems using Korean whole-genome sequencing data
    Hyun, Jungwon
    Oh, Sujin
    Hong, Yun Ji
    Park, Kyoung Un
    PLOS ONE, 2022, 17 (06):
  • [32] Assessing telomeric DNA content in pediatric cancers using whole-genome sequencing data
    Parker, Matthew
    Chen, Xiang
    Bahrami, Armita
    Dalton, James
    Rusch, Michael
    Wu, Gang
    Easton, John
    Cheung, Nai-Kong
    Dyer, Michael
    Mardis, Elaine R.
    Wilson, Richard K.
    Mullighan, Charles
    Gilbertson, Richard
    Baker, Suzanne J.
    Zambetti, Gerard
    Ellison, David W.
    Downing, James R.
    Zhang, Jinghui
    GENOME BIOLOGY, 2012, 13 (12)
  • [33] Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data
    Yang Wu
    Zhili Zheng
    Peter M. Visscher
    Jian Yang
    Genome Biology, 18
  • [34] Characterization of Runs of Homozygosity Islands in American Mink Using Whole-Genome Sequencing Data
    Davoudi, Pourya
    Do, Duy Ngoc
    Colombo, Stefanie
    Rathgeber, Bruce
    Sargolzaei, Mehdi
    Plastow, Graham
    Wang, Zhiquan
    Miar, Younes
    JOURNAL OF ANIMAL SCIENCE, 2023, 101
  • [35] Haplotype and population structure inference using neural networks in whole-genome sequencing data
    Meisner, Jonas
    Albrechtsen, Anders
    GENOME RESEARCH, 2022, 32 (08) : 1542 - 1552
  • [36] Investigation of Hanwoo-specific structural variations using whole-genome sequencing data
    Jangho Park
    Wonseok Shin
    Seyoung Mun
    Man Hwan Oh
    Dajeong Lim
    Dong-Yep Oh
    Youngjune Bhak
    Jong Bhak
    Yong-Soo Park
    Kyudong Han
    Genes & Genomics, 2019, 41 : 233 - 240
  • [37] Detection of genomic variations and selection signatures in Wagyu using whole-genome sequencing data
    Shi, Lulu
    Hu, Mingyue
    Lai, Weining
    Yi, Wenfeng
    Liu, Zhengxi
    Sun, Hao
    Li, Feng
    Yan, Shouqing
    ANIMAL GENETICS, 2023, 54 (06) : 808 - 812
  • [38] Investigation of Hanwoo-specific structural variations using whole-genome sequencing data
    Park, Jangho
    Shin, Wonseok
    Mun, Seyoung
    Oh, Man Hwan
    Lim, Dajeong
    Oh, Dong-Yep
    Bhak, Youngjune
    Bhak, Jong
    Park, Yong-Soo
    Han, Kyudong
    GENES & GENOMICS, 2019, 41 (02) : 233 - 240
  • [39] Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data
    Wu, Yang
    Zheng, Zhili
    Visscher, Peter M.
    Yang, Jian
    GENOME BIOLOGY, 2017, 18
  • [40] Investigation of short tandem repeats in major depression using whole-genome sequencing data
    Yu, Chenglong
    Baune, Bernhard T.
    Wong, Ma-Li
    Licinio, Julio
    JOURNAL OF AFFECTIVE DISORDERS, 2018, 232 : 305 - 309