Identification of indels in next-generation sequencing data

被引:35
|
作者
Ratan, Aakrosh [1 ,5 ,6 ]
Olson, Thomas L. [2 ,3 ,4 ]
Loughran, Thomas P., Jr. [2 ,3 ,4 ]
Miller, Webb [1 ]
机构
[1] Penn State Univ, Ctr Comparat Genom & Bioinformat, University Pk, PA 16802 USA
[2] Univ Virginia, Dept Med, Charlottesville, VA 22908 USA
[3] Univ Virginia, Dept Hematol & Oncol, Charlottesville, VA 22908 USA
[4] Univ Virginia, Univ Virginia Canc Ctr, Charlottesville, VA 22908 USA
[5] Univ Virginia, Dept Publ Hlth Sci, Charlottesville, VA 22908 USA
[6] Univ Virginia, Ctr Publ Hlth Gen, Charlottesville, VA 22908 USA
来源
BMC BIOINFORMATICS | 2015年 / 16卷
关键词
Indels; Variants; Sequencing analysis; STRUCTURAL VARIATION; SMALL INSERTIONS; SHORT-READ; DELETIONS;
D O I
10.1186/s12859-015-0483-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The discovery and mapping of genomic variants is an essential step in most analysis done using sequencing reads. There are a number of mature software packages and associated pipelines that can identify single nucleotide polymorphisms (SNPs) with a high degree of concordance. However, the same cannot be said for tools that are used to identify the other types of variants. Indels represent the second most frequent class of variants in the human genome, after single nucleotide polymorphisms. The reliable detection of indels is still a challenging problem, especially for variants that are longer than a few bases. Results: We have developed a set of algorithms and heuristics collectively called indelMINER to identify indels from whole genome resequencing datasets using paired-end reads. indelMINER uses a split-read approach to identify the precise breakpoints for indels of size less than a user specified threshold, and supplements that with a paired-end approach to identify larger variants that are frequently missed with the split-read approach. We use simulated and real datasets to show that an implementation of the algorithm performs favorably when compared to several existing tools. Conclusions: indelMINER can be used effectively to identify indels in whole-genome resequencing projects. The output is provided in the VCF format along with additional information about the variant, including information about its presence or absence in another sample. The source code and documentation for indelMINER can be freely downloaded from www.bx.psu.edu/miller_lab/indelMINER.tar.gz.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Identification of indels in next-generation sequencing data
    Aakrosh Ratan
    Thomas L Olson
    Thomas P Loughran
    Webb Miller
    BMC Bioinformatics, 16
  • [2] PriVar: a toolkit for prioritizing SNVs and indels from next-generation sequencing data
    Zhang, Lu
    Zhang, Jing
    Yang, Jing
    Ying, Dingge
    Lau, Yu Lung
    Yang, Wanling
    BIOINFORMATICS, 2013, 29 (01) : 124 - 125
  • [3] NGSNGS: next-generation simulator for next-generation sequencing data
    Henriksen, Rasmus Amund
    Zhao, Lei
    Korneliussen, Thorfinn Sand
    BIOINFORMATICS, 2023, 39 (01)
  • [4] Next-Generation Sequencing Data Analysis
    Chowdhry, Amit K.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2024,
  • [5] Indexing Next-Generation Sequencing data
    Jalili, Vahid
    Matteucci, Matteo
    Masseroli, Marco
    Ceri, Stefano
    INFORMATION SCIENCES, 2017, 384 : 90 - 109
  • [6] Identification of recombination events in outbred species with next-generation sequencing data
    Shentong Tao
    Jiyan Wu
    Dan Yao
    Yuhua Chen
    Wenguo Yang
    Chunfa Tong
    BMC Genomics, 19
  • [7] Drug Resistance Gene Identification Algorithm for Next-Generation Sequencing Data
    Hua, Guan-Jie
    Hung, Che-Lun
    Tang, Chuan Yi
    Zheng, Huiru
    2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,
  • [8] Identification of recombination events in outbred species with next-generation sequencing data
    Tao, Shentong
    Wu, Jiyan
    Yao, Dan
    Chen, Yuhua
    Yang, Wenguo
    Tong, Chunfa
    BMC GENOMICS, 2018, 19
  • [9] PATMAP: Polyadenylation Site Identification from Next-Generation Sequencing Data
    Wu, Xiaohui
    Tang, Meishuang
    Yao, Junfeng
    Lin, Shuiyuan
    Xiang, Zhe
    Ji, Guoli
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT I, 2012, 7208 : 485 - 496
  • [10] Dynamic Linear Model for the Identification of miRNAs in Next-Generation Sequencing Data
    Johnson, W. Evan
    Welker, Noah C.
    Bass, Brenda L.
    BIOMETRICS, 2011, 67 (04) : 1206 - 1214