PBHoney: identifying genomic variants via long-read discordance and interrupted mapping

被引:105
作者
English, Adam C. [1 ]
Salerno, William J. [1 ]
Reid, Jeffrey G. [1 ]
机构
[1] Baylor Coll Med, Human Genome Sequencing Ctr, Houston, TX 77030 USA
来源
BMC BIOINFORMATICS | 2014年 / 15卷
关键词
Structural variation; Sequencing; PacificBiosciences; COPY-NUMBER VARIATION; STRUCTURAL VARIATION; RESOLUTION; ALIGNMENT;
D O I
10.1186/1471-2105-15-180
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: As resequencing projects become more prevalent across a larger number of species, accurate variant identification will further elucidate the nature of genetic diversity and become increasingly relevant in genomic studies. However, the identification of larger genomic variants via DNA sequencing is limited by both the incomplete information provided by sequencing reads and the nature of the genome itself. Long-read sequencing technologies provide high-resolution access to structural variants often inaccessible to shorter reads. Results: We present PBHoney, software that considers both intra-read discordance and soft-clipped tails of long reads (> 10, 000 bp) to identify structural variants. As a proof of concept, we identify four structural variants and two genomic features in a strain of Escherichia coli with PBHoney and validate them via de novo assembly. PBHoney is available for download at http://sourceforge.net/projects/pb-jelly/. Conclusions: Implementing two variant-identification approaches that exploit the high mappability of long reads, PBHoney is demonstrated as being effective at detecting larger structural variants using whole-genome Pacific Biosciences RS II Continuous Long Reads. Furthermore, PBHoney is able to discover two genomic features: the existence of Rac-Phage in isolate; evidence of E. coli's circular genome.
引用
收藏
页数:7
相关论文
共 24 条
  • [1] Implications of gene copy-number variation in health and diseases
    Almal, Suhani H.
    Padh, Harish
    [J]. JOURNAL OF HUMAN GENETICS, 2012, 57 (01) : 6 - 13
  • [2] Tandem repeats finder: a program to analyze DNA sequences
    Benson, G
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (02) : 573 - 580
  • [3] Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory
    Chaisson, Mark J.
    Tesler, Glenn
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [4] Chen K, 2009, NAT METHODS, V6, P677, DOI [10.1038/NMETH.1363, 10.1038/nmeth.1363]
  • [5] Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/NMETH.2474, 10.1038/nmeth.2474]
  • [6] YAHA: fast and flexible long-read alignment with optimal breakpoint detection
    Faust, Gregory G.
    Hall, Ira M.
    [J]. BIOINFORMATICS, 2012, 28 (19) : 2417 - 2424
  • [7] Impacts of Variation in the Human Genome on Gene Regulation
    Haraksingh, Rajini R.
    Snyder, Michael P.
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2013, 425 (21) : 3970 - 3977
  • [8] Mechanisms of change in gene copy number
    Hastings, P. J.
    Lupski, James R.
    Rosenberg, Susan M.
    Ira, Grzegorz
    [J]. NATURE REVIEWS GENETICS, 2009, 10 (08) : 551 - 564
  • [9] Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes
    Kent, WJ
    Baertsch, R
    Hinrichs, A
    Miller, W
    Haussler, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (20) : 11484 - 11489
  • [10] Copy-Number Variations, Noncoding Sequences, and Human Phenotypes
    Klopocki, Eva
    Mundlos, Stefan
    [J]. ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 12, 2011, 12 : 53 - 72