Identification of indels in next-generation sequencing data

被引:35
作者
Ratan, Aakrosh [1 ,5 ,6 ]
Olson, Thomas L. [2 ,3 ,4 ]
Loughran, Thomas P., Jr. [2 ,3 ,4 ]
Miller, Webb [1 ]
机构
[1] Penn State Univ, Ctr Comparat Genom & Bioinformat, University Pk, PA 16802 USA
[2] Univ Virginia, Dept Med, Charlottesville, VA 22908 USA
[3] Univ Virginia, Dept Hematol & Oncol, Charlottesville, VA 22908 USA
[4] Univ Virginia, Univ Virginia Canc Ctr, Charlottesville, VA 22908 USA
[5] Univ Virginia, Dept Publ Hlth Sci, Charlottesville, VA 22908 USA
[6] Univ Virginia, Ctr Publ Hlth Gen, Charlottesville, VA 22908 USA
来源
BMC BIOINFORMATICS | 2015年 / 16卷
关键词
Indels; Variants; Sequencing analysis; STRUCTURAL VARIATION; SMALL INSERTIONS; SHORT-READ; DELETIONS;
D O I
10.1186/s12859-015-0483-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The discovery and mapping of genomic variants is an essential step in most analysis done using sequencing reads. There are a number of mature software packages and associated pipelines that can identify single nucleotide polymorphisms (SNPs) with a high degree of concordance. However, the same cannot be said for tools that are used to identify the other types of variants. Indels represent the second most frequent class of variants in the human genome, after single nucleotide polymorphisms. The reliable detection of indels is still a challenging problem, especially for variants that are longer than a few bases. Results: We have developed a set of algorithms and heuristics collectively called indelMINER to identify indels from whole genome resequencing datasets using paired-end reads. indelMINER uses a split-read approach to identify the precise breakpoints for indels of size less than a user specified threshold, and supplements that with a paired-end approach to identify larger variants that are frequently missed with the split-read approach. We use simulated and real datasets to show that an implementation of the algorithm performs favorably when compared to several existing tools. Conclusions: indelMINER can be used effectively to identify indels in whole-genome resequencing projects. The output is provided in the VCF format along with additional information about the variant, including information about its presence or absence in another sample. The source code and documentation for indelMINER can be freely downloaded from www.bx.psu.edu/miller_lab/indelMINER.tar.gz.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
    McKenna, Aaron
    Hanna, Matthew
    Banks, Eric
    Sivachenko, Andrey
    Cibulskis, Kristian
    Kernytsky, Andrew
    Garimella, Kiran
    Altshuler, David
    Gabriel, Stacey
    Daly, Mark
    DePristo, Mark A.
    GENOME RESEARCH, 2010, 20 (09) : 1297 - 1303
  • [22] Statistical challenges associated with detecting copy number variations with next-generation sequencing
    Teo, Shu Mei
    Pawitan, Yudi
    Ku, Chee Seng
    Chia, Kee Seng
    Salim, Agus
    BIOINFORMATICS, 2012, 28 (21) : 2711 - 2718
  • [23] Targeted next-generation sequencing identification of mutations in patients with disorders of sex development
    Dong, Yanling
    Yi, Yuting
    Yao, Hong
    Yang, Ziying
    Hu, Huamei
    Liu, Jiucheng
    Gao, Changxin
    Zhang, Ming
    Zhou, Liying
    Asan
    Yi, Xin
    Liang, Zhiqing
    BMC MEDICAL GENETICS, 2016, 17
  • [24] Somatic hypermutation analysis for improved identification of B cell clonal families from next-generation sequencing data
    Nouri, Nima
    Kleinstein, Steven H.
    PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (06)
  • [25] Waardenburg Syndrome: The Contribution of Next-Generation Sequencing to the Identification of Novel Causative Variants
    Bertani-Torres, William
    Lezirovitz, Karina
    Alencar-Coutinho, Danillo
    Pardono, Eliete
    da Costa, Silvia Souza
    Antunes, Larissa do Nascimento
    de Oliveira, Judite
    Otto, Paulo Alberto
    Pingault, Veronique
    Mingroni-Netto, Regina Celia
    AUDIOLOGY RESEARCH, 2024, 14 (01) : 9 - 25
  • [26] Computational methods for discovering structural variation with next-generation sequencing
    Medvedev, Paul
    Stanciu, Monica
    Brudno, Michael
    NATURE METHODS, 2009, 6 (11) : S13 - S20
  • [27] Robust inference of population structure from next-generation sequencing data with systematic differences in sequencing
    Liao, Peizhou
    Satten, Glen A.
    Hu, Yi-Juan
    BIOINFORMATICS, 2018, 34 (07) : 1157 - 1163
  • [28] Targeted next-generation sequencing in monogenic dyslipidemias
    Hegele, Robert A.
    Ban, Matthew R.
    Cao, Henian
    McIntyre, Adam D.
    Robinson, John F.
    Wang, Jian
    CURRENT OPINION IN LIPIDOLOGY, 2015, 26 (02) : 103 - 113
  • [29] Next-generation sequencing and large genome assemblies
    Henson, Joseph
    Tischler, German
    Ning, Zemin
    PHARMACOGENOMICS, 2012, 13 (08) : 901 - 915
  • [30] Next-generation sequencing to assess HIV tropism
    Swenson, Luke C.
    Daeumer, Martin
    Paredes, Roger
    CURRENT OPINION IN HIV AND AIDS, 2012, 7 (05) : 478 - 485