ERVcaller: identifying polymorphic endogenous retrovirus and other transposable element insertions using whole-genome sequencing data

被引:19
|
作者
Chen, Xun [1 ]
Li, Dawei [1 ,2 ,3 ]
机构
[1] Univ Vermont, Dept Microbiol & Mol Genet, Burlington, VT 05405 USA
[2] Univ Vermont, Neurosci Behav & Hlth Initiat, Burlington, VT 05405 USA
[3] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
关键词
STRUCTURAL VARIATION; DISCOVERY; EVOLUTION; REVEALS; FORMAT;
D O I
10.1093/bioinformatics/btz205
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Approximately 8% of the human genome is derived from endogenous retroviruses (ERVs). In recent years, an increasing number of human diseases have been found to be associated with ERVs. However, it remains challenging to accurately detect the full spectrum of polymorphic (unfixed) ERVs using whole-genome sequencing (WGS) data. Results: We designed a new tool, ERVcaller, to detect and genotype transposable element (TE) insertions, including ERVs, in the human genome. We evaluated ERVcaller using both simulated and real benchmark WGS datasets. Compared to existing tools, ERVcaller consistently obtained both the highest sensitivity and precision for detecting simulated ERV and other TE insertions derived from real polymorphic TE sequences. For the WGS data from the 1000 Genomes Project, ERVcaller detected the largest number of TE insertions per sample based on consensus TE loci. By analyzing the experimentally verified TE insertions, ERVcaller had 94.0% TE detection sensitivity and 96.6% genotyping accuracy. Polymerase chain reaction and Sanger sequencing in a small sample set verified 86.7% of examined insertion statuses and 100% of examined genotypes. In conclusion, ERVcaller is capable of detecting and genotyping TE insertions using WGS data with both high sensitivity and precision. This tool can be applied broadly to other species.
引用
收藏
页码:3913 / 3922
页数:10
相关论文
共 50 条
  • [41] Characterization of Runs of Homozygosity Islands in American Mink Using Whole-Genome Sequencing Data
    Davoudi, Pourya
    Do, Duy Ngoc
    Colombo, Stefanie
    Rathgeber, Bruce
    Sargolzaei, Mehdi
    Plastow, Graham
    Wang, Zhiquan
    Miar, Younes
    JOURNAL OF ANIMAL SCIENCE, 2023, 101 : 351 - 352
  • [42] Assessing telomeric DNA content in pediatric cancers using whole-genome sequencing data
    Matthew Parker
    Xiang Chen
    Armita Bahrami
    James Dalton
    Michael Rusch
    Gang Wu
    John Easton
    Nai-Kong Cheung
    Michael Dyer
    Elaine R Mardis
    Richard K Wilson
    Charles Mullighan
    Richard Gilbertson
    Suzanne J Baker
    Gerard Zambetti
    David W Ellison
    James R Downing
    Jinghui Zhang
    Genome Biology, 13
  • [43] Identification of polymorphic markers for germplasm conservation of three precious Chinese palace goldfish using whole-genome sequencing
    Huang, Yuwei
    Cao, Aiying
    Zhang, Beiyuan
    Li, Sen
    He, Chuan
    Gao, Jian
    Cao, Xiaojuan
    ANIMAL GENETICS, 2024, 55 (03) : 484 - 489
  • [44] Identifying Genomic Variations in Monozygotic Twins Discordant for Autism Spectrum Disorder Using Whole-Genome Sequencing
    Huang, Yan
    Zhao, Yue
    Ren, Yue
    Yi, Ying
    Li, Xiaodan
    Gao, Zhaomin
    Zhan, Xiaolei
    Yu, Jia
    Wang, Dong
    Liang, Shuang
    Wu, Lijie
    MOLECULAR THERAPY-NUCLEIC ACIDS, 2019, 14 : 204 - 211
  • [45] IDENTIFYING RARE VARIATION IN CASES OF SCHIZOPHRENIA IN THE ISOLATED POPULATION OF THE FAROE ISLANDS USING WHOLE-GENOME SEQUENCING
    Als, Thomas
    Lescai, Francesco
    Dahl, Hans A.
    Demontis, Ditte
    Wang, August Gabriel
    Ellefsen, Gudrid Andorsdottir
    Johansen, Oddbjorg
    Biskopso, Marjun
    Grove, Jakob
    Nyegaard, Mette
    Bolund, Lars
    Mors, Ole
    Wang, Jun
    Borglum, Anders
    EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2017, 27 : S190 - S191
  • [46] An efficient pipeline for ancient DNA mapping and recovery of endogenous ancient DNA from whole-genome sequencing data
    Xu, Wenhao
    Lin, Yu
    Zhao, Keliang
    Li, Haimeng
    Tian, Yinping
    Ngatia, Jacob Njaramba
    Ma, Yue
    Sahu, Sunil Kumar
    Guo, Huabing
    Guo, Xiaosen
    Xu, Yan Chun
    Liu, Huan
    Kristiansen, Karsten
    Lan, Tianming
    Zhou, Xinying
    ECOLOGY AND EVOLUTION, 2021, 11 (01): : 390 - 401
  • [47] Identifying rare variants inconsistent with identity-by-descent in population-scale whole-genome sequencing data
    Johnson, Kelsey E.
    Adams, Christopher J.
    Voight, Benjamin F.
    METHODS IN ECOLOGY AND EVOLUTION, 2022, 13 (11): : 2429 - 2442
  • [48] A bioinformatics approach for identifying transgene insertion sites using whole genome sequencing data
    Park, Doori
    Park, Su-Hyun
    Ban, Yong Wook
    Kim, Youn Shic
    Park, Kyoung-Cheul
    Nam-Soo Kim
    Kim, Ju-Kon
    Choi, Ik-Young
    BMC BIOTECHNOLOGY, 2017, 17
  • [49] A bioinformatics approach for identifying transgene insertion sites using whole genome sequencing data
    Doori Park
    Su-Hyun Park
    Yong Wook Ban
    Youn Shic Kim
    Kyoung-Cheul Park
    Nam-Soo Kim
    Ju-Kon Kim
    Ik-Young Choi
    BMC Biotechnology, 17
  • [50] Efficient and fast identification of differentially methylated regions using whole-genome bisulfite sequencing data
    Diep, Dinh
    Zhang, Kun
    JOURNAL OF GENETICS AND GENOMICS, 2018, 45 (08) : 455 - 457