MutScan: fast detection and visualization of target mutations by scanning FASTQ data

被引:14
作者
Chen, Shifu [1 ,2 ,3 ]
Huang, Tanxiao [2 ]
Wen, Tiexiang [1 ]
Li, Hong [2 ]
Xu, Mingyan [2 ]
Gu, Jia [1 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[2] HaploX Biotechnol, Shenzhen, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
美国国家科学基金会;
关键词
MutScan; Mutation scan; Variant visualization; Fast detection; TUMOR DNA; CANCER; KRAS;
D O I
10.1186/s12859-018-2024-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Some types of clinical genetic tests, such as cancer testing using circulating tumor DNA (ctDNA), require sensitive detection of known target mutations. However, conventional next-generation sequencing (NGS) data analysis pipelines typically involve different steps of filtering, which may cause miss-detection of key mutations with low frequencies. Variant validation is also indicated for key mutations detected by bioinformatics pipelines. Typically, this process can be executed using alignment visualization tools such as IGV or GenomeBrowse. However, these tools are too heavy and therefore unsuitable for validating mutations in ultra-deep sequencing data. Result: We developed MutScan to address problems of sensitive detection and efficient validation for target mutations. MutScan involves highly optimized string-searching algorithms, which can scan input FASTQ files to grab all reads that support target mutations. The collected supporting reads for each target mutation will be piled up and visualized using web technologies such as HTML and JavaScript. Algorithms such as rolling hash and bloom filter are applied to accelerate scanning and make MutScan applicable to detect or visualize target mutations in a very fast way. Conclusion: MutScan is a tool for the detection and visualization of target mutations by only scanning FASTQ raw data directly. Compared to conventional pipelines, this offers a very high performance, executing about 20 times faster, and offering maximal sensitivity since it can grab mutations with even one single supporting read. MutScan visualizes detected mutations by generating interactive pile-ups using web technologies. These can serve to validate target mutations, thus avoiding false positives. Furthermore, MutScan can visualize all mutation records in a VCF file to HTML pages for cloud-friendly VCF validation.
引用
收藏
页数:11
相关论文
共 22 条
[1]   Next Generation Sequencing of Pooled Samples: Guideline for Variants' Filtering [J].
Anand, Santosh ;
Mangano, Eleonora ;
Barizzone, Nadia ;
Bordoni, Roberta ;
Sorosina, Melissa ;
Clarelli, Ferdinando ;
Corrado, Lucia ;
Boneschi, Filippo Martinelli ;
D'Alfonso, Sandra ;
De Bellis, Gianluca .
SCIENTIFIC REPORTS, 2016, 6
[2]   Detection of Circulating Tumor DNA in Early- and Late-Stage Human Malignancies [J].
Bettegowda, Chetan ;
Sausen, Mark ;
Leary, Rebecca J. ;
Kinde, Isaac ;
Wang, Yuxuan ;
Agrawal, Nishant ;
Bartlett, Bjarne R. ;
Wang, Hao ;
Luber, Brandon ;
Alani, Rhoda M. ;
Antonarakis, Emmanuel S. ;
Azad, Nilofer S. ;
Bardelli, Alberto ;
Brem, Henry ;
Cameron, John L. ;
Lee, Clarence C. ;
Fecher, Leslie A. ;
Gallia, Gary L. ;
Gibbs, Peter ;
Le, Dung ;
Giuntoli, Robert L. ;
Goggins, Michael ;
Hogarty, Michael D. ;
Holdhoff, Matthias ;
Hong, Seung-Mo ;
Jiao, Yuchen ;
Juhl, Hartmut H. ;
Kim, Jenny J. ;
Siravegna, Giulia ;
Laheru, Daniel A. ;
Lauricella, Calogero ;
Lim, Michael ;
Lipson, Evan J. ;
Marie, Suely Kazue Nagahashi ;
Netto, George J. ;
Oliner, Kelly S. ;
Olivi, Alessandro ;
Olsson, Louise ;
Riggins, Gregory J. ;
Sartore-Bianchi, Andrea ;
Schmidt, Kerstin ;
Shih, Ie-Ming ;
Oba-Shinjo, Sueli Mieko ;
Siena, Salvatore ;
Theodorescu, Dan ;
Tie, Jeanne ;
Harkins, Timothy T. ;
Veronese, Silvio ;
Wang, Tian-Li ;
Weingart, Jon D. .
SCIENCE TRANSLATIONAL MEDICINE, 2014, 6 (224)
[3]   Potential clinical utility of ultrasensitive circulating tumor DNA detection with CAPP-Seq [J].
Bratman, Scott V. ;
Newman, Aaron M. ;
Alizadeh, Ash A. ;
Diehn, Maximilian .
EXPERT REVIEW OF MOLECULAR DIAGNOSTICS, 2015, 15 (06) :715-719
[4]   PIK3CA Genotype and Treatment Decisions in Human Epidermal Growth Factor Receptor 2-Positive Breast Cancer [J].
Cescon, David W. ;
Bedard, Philippe L. .
JOURNAL OF CLINICAL ONCOLOGY, 2015, 33 (12) :1318-+
[5]   An efficient piecewise hashing method for computer forensics [J].
Chen, Long ;
Wang, Guoyin .
FIRST INTERNATIONAL WORKSHOP ON KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, :635-638
[6]  
Chen S., 2018, BMC Bioinformatics
[7]   Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples [J].
Cibulskis, Kristian ;
Lawrence, Michael S. ;
Carter, Scott L. ;
Sivachenko, Andrey ;
Jaffe, David ;
Sougnez, Carrie ;
Gabriel, Stacey ;
Meyerson, Matthew ;
Lander, Eric S. ;
Getz, Gad .
NATURE BIOTECHNOLOGY, 2013, 31 (03) :213-219
[8]   Standardization and quality management in next-generation sequencing [J].
Endrullat, Christoph ;
Gloekler, Joern ;
Franke, Philipp ;
Frohme, Marcus .
APPLIED AND TRANSLATIONAL GENOMICS, 2016, 10 :2-9
[9]   A survey of graph edit distance [J].
Gao, Xinbo ;
Xiao, Bing ;
Tao, Dacheng ;
Li, Xuelong .
PATTERN ANALYSIS AND APPLICATIONS, 2010, 13 (01) :113-129
[10]   The impact of tumor profiling approaches and genomic data strategies for cancer precision medicine [J].
Garofalo, Andrea ;
Sholl, Lynette ;
Reardon, Brendan ;
Taylor-Weiner, Amaro ;
Amin-Mansour, Ali ;
Miao, Diana ;
Liu, David ;
Oliver, Nelly ;
MacConaill, Laura ;
Ducar, Matthew ;
Rojas-Rudilla, Vanesa ;
Giannakis, Marios ;
Ghazani, Arezou ;
Gray, Stacy ;
Janne, Pasi ;
Garber, Judy ;
Joffe, Steve ;
Lindeman, Neal ;
Wagle, Nikhil ;
Garraway, Levi A. ;
Van Allen, Eliezer M. .
GENOME MEDICINE, 2016, 8