Robust and exact structural variation detection with paired-end and soft-clipped alignments: SoftSV compared with eight algorithms

被引:37
作者
Bartenhagen, Christoph [1 ]
Dugas, Martin [1 ]
机构
[1] Univ Munster, Inst Med Informat, Albert Schweitzer Campus 1, D-48149 Munster, Germany
关键词
structural variation; paired-end sequencing; split-reads; simulation; CANCER; VARIANT; IDENTIFICATION; REARRANGEMENTS; GENOMES;
D O I
10.1093/bib/bbv028
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Structural variation (SV) plays an important role in genetic diversity among the population in general and specifically in diseases such as cancer. Modern next-generation sequencing (NGS) technologies provide paired-end sequencing data at high depth with increasing read lengths. This development enabled the analysis of split-reads to detect SV breakpoints with single-nucleotide resolution. But ambiguous mappings and breakpoint sequences with further co-occurring mutations hamper split-read alignments against a reference sequence. The trade-off between high sensitivity and low false-positive rate is problematic and often requires a lot of fine-tuning of the analysis method based on knowledge about its algorithm and the characteristics of the data set. We present SoftSV, a method for exact breakpoint detection for small and large deletions, inversions, tandem duplications and inter-chromosomal translocations, which relies solely on the mutual alignment of soft-clipped reads within the neighborhood of discordantly mapped paired-end reads. Unlike other SV detection algorithms, our approach does not require thresholds regarding sequencing coverage or mapping quality. We evaluate SoftSV together with eight approaches (Breakdancer, Clever, CREST, Delly, GASVPro, Pindel, Socrates and SoftSearch) on simulated and real data sets. Our results show that sensitive and reliable SV detection is subject to many different factors like read length, sequence coverage and SV type. While most programs have their individual drawbacks, our greedy approach turns out to be the most robust and sensitive on many experimental setups. Sensitivities above 85% and positive predictive values between 80 and 100% could be achieved consistently for all SV types on simulated data sets starting at relatively short 75 bp reads and low 10-15x sequence coverage.
引用
收藏
页码:51 / 62
页数:12
相关论文
共 27 条
[1]   BreaKmer: detection of structural variation in targeted massively parallel sequencing data using kmers [J].
Abo, Ryan P. ;
Ducar, Matthew ;
Garcia, Elizabeth P. ;
Thorner, Aaron R. ;
Rojas-Rudilla, Vanesa ;
Lin, Ling ;
Sholl, Lynette M. ;
Hahn, William C. ;
Meyerson, Matthew ;
Lindeman, Neal I. ;
Van Hummelen, Paul ;
MacConaill, Laura E. .
NUCLEIC ACIDS RESEARCH, 2015, 43 (03)
[2]   RSVSim: an R/Bioconductor package for the simulation of structural variations [J].
Bartenhagen, Christoph ;
Dugas, Martin .
BIOINFORMATICS, 2013, 29 (13) :1679-1681
[3]   FINDING ALL CLIQUES OF AN UNDIRECTED GRAPH [H] [J].
BRON, C ;
KERBOSCH, J .
COMMUNICATIONS OF THE ACM, 1973, 16 (09) :575-577
[4]  
Chen K, 2009, NAT METHODS, V6, P677, DOI [10.1038/NMETH.1363, 10.1038/nmeth.1363]
[5]  
Gillet-Markowska A, 2014, BIOINFORMATICS
[6]   SoftSearch: Integration of Multiple Sequence Features to Identify Breakpoints of Structural Variations [J].
Hart, Steven N. ;
Sarangi, Vivekananda ;
Moore, Raymond ;
Baheti, Saurabh ;
Bhavsar, Jaysheel D. ;
Couch, Fergus J. ;
Kocher, Jean-Pierre A. .
PLOS ONE, 2013, 8 (12)
[7]  
Homer N., WHOLE GENOME SIMULAT
[8]   Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes [J].
Hormozdiari, Fereydoun ;
Alkan, Can ;
Eichler, Evan E. ;
Sahinalp, S. Cenk .
GENOME RESEARCH, 2009, 19 (07) :1270-1278
[9]   Detection of large-scale variation in the human genome [J].
Iafrate, AJ ;
Feuk, L ;
Rivera, MN ;
Listewnik, ML ;
Donahoe, PK ;
Qi, Y ;
Scherer, SW ;
Lee, C .
NATURE GENETICS, 2004, 36 (09) :949-951
[10]   Mapping and sequencing of structural variation from eight human genomes (Reprinted from Nature, vol 453, pg 56-64, 2008) [J].
Kidd, Jeffrey M. ;
Cooper, Gregory M. ;
Donahue, William F. ;
Hayden, Hillary S. ;
Sampas, Nick ;
Graves, Tina ;
Hansen, Nancy ;
Teague, Brian ;
Alkan, Can ;
Antonacci, Francesca ;
Haugen, Eric ;
Zerr, Troy ;
Yamada, N. Alice ;
Tsang, Peter ;
Newman, Tera L. ;
Tuzun, Eray ;
Cheng, Ze ;
Ebling, Heather M. ;
Tusneem, Nadeem ;
David, Robert ;
Gillett, Will ;
Phelps, Karen A. ;
Weaver, Molly ;
Saranga, David ;
Brand, Adrianne ;
Tao, Wei ;
Gustafson, Erik ;
McKernan, Kevin ;
Chen, Lin ;
Malig, Maika ;
Smith, Joshua D. ;
Korn, Joshua M. ;
McCarroll, Steven A. ;
Altshuler, David A. ;
Peiffer, Daniel A. ;
Dorschner, Michael ;
Stamatoyannopoulos, John ;
Schwartz, David ;
Nickerson, Deborah A. ;
Mullikin, James C. ;
Wilson, Richard K. ;
Bruhn, Laurakay ;
Olson, Maynard V. ;
Kaul, Rajinder ;
Smith, Douglas R. ;
Eichler, Evan E. .
NATURE GENETICS, 2009, :S22-S30