Analysis of genomic rearrangements by using the Burrows-Wheeler transform of short-read data

被引:6
作者
Kimura, Kouichi [1 ]
Koike, Asako [1 ]
机构
[1] Hitachi Ltd, Res & Dev Grp, Ctr Technol Innovat, Biosyst Res Dept, Kokubunji, Tokyo 1858601, Japan
来源
BMC BIOINFORMATICS | 2015年 / 16卷
关键词
STRUCTURAL VARIATION; CANCER GENOMES; COMPRESSION; RESOLUTION; ALGORITHM;
D O I
10.1186/1471-2105-16-S18-S5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The potential utility of the Burrows-Wheeler transform (BWT) of a large amount of short-read data ("reads") has not been fully studied. The BWT basically serves as a lossless dictionary of reads, unlike the heuristic and lossy reads-to-genome mapping results conventionally obtained in the first step of sequence analysis. Thus, it is naturally expected to lead to development of sensitive methods for analysis of short-read data. Recently, one of the most active areas of research in sequence analysis is sensitive detection of rare genomic rearrangements from whole-genome sequencing (WGS) data of heterogeneous cancer samples. The application the BWT of reads to the analysis of genomic rearrangements is addressed in this study. Results: A new method for sensitive detection of genomic rearrangements by using the BWT of reads in the following three steps is proposed: first, breakpoint regions, which contain breakpoints and are joined together by rearrangement, are predicted from the distribution of so-called discordant pairs by using a kind of the conjugate gradient method; second, reads partially matching the breakpoint regions are collected from the BWT of reads; and third, breakpoints are detected as branching points among the collected reads, and their precise positions are determined. The method was experimentally implemented, and its performance (i.e., sensitivity and specificity) was evaluated by using simulated data with known artificial rearrangements. It was applied to publicly available real biological WGS data of cancer patients, and the detection results were compared with published results. Conclusions: Serving as a lossless dictionary of reads, the BWT of short reads enables sensitive analysis of genomic rearrangements in heterogeneous cancer-genome samples when used in conjunction with breakpoint-region predictions based on a conjugate gradient method.
引用
收藏
页数:12
相关论文
共 22 条
  • [1] Punctuated Evolution of Prostate Cancer Genomes
    Baca, Sylvan C.
    Prandi, Davide
    Lawrence, Michael S.
    Mosquera, Juan Miguel
    Romanel, Alessandro
    Drier, Yotam
    Park, Kyung
    Kitabayashi, Naoki
    MacDonald, Theresa Y.
    Ghandi, Mahmoud
    Van Allen, Eliezer
    Kryukov, Gregory V.
    Sboner, Andrea
    Theurillat, Jean-Philippe
    Soong, T. David
    Nickerson, Elizabeth
    Auclair, Daniel
    Tewari, Ashutosh
    Beltran, Himisha
    Onofrio, Robert C.
    Boysen, Gunther
    Guiducci, Candace
    Barbieri, Christopher E.
    Cibulskis, Kristian
    Sivachenko, Andrey
    Carter, Scott L.
    Saksena, Gordon
    Voet, Douglas
    Ramos, Alex H.
    Winckler, Wendy
    Cipicchio, Michelle
    Ardlie, Kristin
    Kantoff, Philip W.
    Berger, Michael F.
    Gabriel, Stacey B.
    Golub, Todd R.
    Meyerson, Matthew
    Lander, Eric S.
    Elemento, Olivier
    Getz, Gad
    Demichelis, Francesca
    Rubin, Mark A.
    Garraway, Levi A.
    [J]. CELL, 2013, 153 (03) : 666 - 677
  • [2] Bauer MJ, 2011, LECT NOTES COMPUT SC, V6661, P219
  • [3] Burrows M, BLOCK SORTING LOSS L, P124
  • [4] Chen K, 2009, NAT METHODS, V6, P677, DOI [10.1038/nmeth.1363, 10.1038/NMETH.1363]
  • [5] Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform
    Cox, Anthony J.
    Bauer, Markus J.
    Jakobi, Tobias
    Rosone, Giovanna
    [J]. BIOINFORMATICS, 2012, 28 (11) : 1415 - 1419
  • [6] Decoding complex patterns of genomic rearrangement in hepatocellular carcinoma
    Fernandez-Banet, Julio
    Lee, Nikki P.
    Chan, Kin Tak
    Gao, Huan
    Liu, Xiao
    Sung, Wing-Kin
    Tan, Winnie
    Fan, Sheung Tat
    Poon, Ronnie T.
    Li, Shiyong
    Ching, Keith
    Rejto, Paul A.
    Mao, Mao
    Kan, Zhengyan
    [J]. GENOMICS, 2014, 103 (2-3) : 189 - 203
  • [7] Opportunistic data structures with applications
    Ferragina, P
    Manzini, G
    [J]. 41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, : 390 - 398
  • [8] Grossi R, 2003, SIAM PROC S, P841
  • [9] Adaptive reference-free compression of sequence quality scores
    Janin, Lilian
    Rosone, Giovanna
    Cox, Anthony J.
    [J]. BIOINFORMATICS, 2014, 30 (01) : 24 - 30
  • [10] PRISM: Pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants
    Jiang, Yue
    Wang, Yadong
    Brudno, Michael
    [J]. BIOINFORMATICS, 2012, 28 (20) : 2576 - 2583