Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data

被引:92
作者
do Valle, Italo Faria [1 ,2 ]
Giampieri, Enrico [1 ]
Simonetti, Giorgia [3 ]
Padella, Antonella [3 ]
Manfrini, Marco [3 ]
Ferrari, Anna [3 ]
Papayannidis, Cristina [3 ]
Zironi, Isabella [1 ]
Garonzi, Marianna [4 ]
Bernardi, Simona [5 ]
Delledonne, Massimo [4 ,6 ]
Martinelli, Giovanni [3 ]
Remondini, Daniel [1 ]
Castellani, Gastone [1 ]
机构
[1] Univ Bologna, Dept Phys & Astron, Bologna, Italy
[2] Minist Educ Brazil, CAPES Fdn, Brasilia, DF, Brazil
[3] Univ Bologna, Dept Expt Diagnost & Specialty Med, Bologna, Italy
[4] Univ Verona, Dept Biotechnol, Verona, Italy
[5] Univ Brescia, Dept Clin & Expt Sci, Unit Blood Dis & Stem Cell Transplantat, Brescia, Italy
[6] Personal Genom, Verona, Italy
关键词
Cancer; Somatic single nucleotide variants; Whole exome sequencing; LUNG ADENOCARCINOMA; CANCER; LANDSCAPE; VARIANTS; MUTATIONS; ANNOTATION; CHALLENGES; PROFILE; TUMORS;
D O I
10.1186/s12859-016-1190-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Detecting somatic mutations in whole exome sequencing data of cancer samples has become a popular approach for profiling cancer development, progression and chemotherapy resistance. Several studies have proposed software packages, filters and parametrizations. However, many research groups reported low concordance among different methods. We aimed to develop a pipeline which detects a wide range of single nucleotide mutations with high validation rates. We combined two standard tools - Genome Analysis Toolkit (GATK) and MuTect -to create the GATK-LODN method. As proof of principle, we applied our pipeline to exome sequencing data of hematological (Acute Myeloid and Acute Lymphoblastic Leukemias) and solid (Gastrointestinal Stromal Tumor and Lung Adenocarcinoma) tumors. We performed experiments on simulated data to test the sensitivity and specificity of our pipeline. Results: The software MuTect presented the highest validation rate (90 %) for mutation detection, but limited number of somatic mutations detected. The GATK detected a high number of mutations but with low specificity. The GATK-LODN increased the performance of the GATK variant detection (from 5 of 14 to 3 of 4 confirmed variants), while preserving mutations not detected by MuTect. However, GATK-LODN filtered more variants in the hematological samples than in the solid tumors. Experiments in simulated data demonstrated that GATK-LODN increased both specificity and sensitivity of GATK results. Conclusion: We presented a pipeline that detects a wide range of somatic single nucleotide variants, with good validation rates, from exome sequencing data of cancer samples. We also showed the advantage of combining standard algorithms to create the GATK-LODN method, that increased specificity and sensitivity of GATK results. This pipeline can be helpful in discovery studies aimed to profile the somatic mutational landscape of cancer genomes.
引用
收藏
页数:9
相关论文
共 28 条
[1]   The hidden genomic landscape of acute myeloid leukemia: subclonal structure revealed by undetected mutations [J].
Bodini, Margherita ;
Ronchini, Chiara ;
Giaco, Luciano ;
Russo, Anna ;
Melloni, Giorgio E. M. ;
Luzi, Lucilla ;
Sardella, Domenico ;
Volorio, Sara ;
Hasan, Syed K. ;
Ottone, Tiziana ;
Lavorgna, Serena ;
Lo-Coco, Francesco ;
Candoni, Anna ;
Fanin, Renato ;
Toffoletti, Eleonora ;
Iacobucci, Ilaria ;
Martinelli, Giovanni ;
Cignetti, Alessandro ;
Tarella, Corrado ;
Bernard, Loris ;
Pelicci, Pier Giuseppe ;
Riva, Laura .
BLOOD, 2015, 125 (04) :600-605
[2]   NGSUtils: a software suite for analyzing and manipulating next-generation sequencing datasets [J].
Breese, Marcus R. ;
Liu, Yunlong .
BIOINFORMATICS, 2013, 29 (04) :494-496
[3]   Effective filtering strategies to improve data quality from population-based whole exome sequencing studies [J].
Carson, Andrew R. ;
Smith, Erin N. ;
Matsui, Hiroko ;
Braekkan, Sigrid K. ;
Jepsen, Kristen ;
Hansen, John-Bjarne ;
Frazer, Kelly A. .
BMC BIOINFORMATICS, 2014, 15
[4]   Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples [J].
Cibulskis, Kristian ;
Lawrence, Michael S. ;
Carter, Scott L. ;
Sivachenko, Andrey ;
Jaffe, David ;
Sougnez, Carrie ;
Gabriel, Stacey ;
Meyerson, Matthew ;
Lander, Eric S. ;
Getz, Gad .
NATURE BIOTECHNOLOGY, 2013, 31 (03) :213-219
[5]   Emerging landscape of oncogenic signatures across human cancers [J].
Ciriello, Giovanni ;
Miller, Martin L. ;
Aksoy, Buelent Arman ;
Senbabaoglu, Yasin ;
Schultz, Nikolaus ;
Sander, Chris .
NATURE GENETICS, 2013, 45 (10) :1127-U247
[6]   Comprehensive molecular profiling of lung adenocarcinoma [J].
Collisson, Eric A. ;
Campbell, Joshua D. ;
Brooks, Angela N. ;
Berger, Alice H. ;
Lee, William ;
Chmielecki, Juliann ;
Beer, David G. ;
Cope, Leslie ;
Creighton, Chad J. ;
Danilova, Ludmila ;
Ding, Li ;
Getz, Gad ;
Hammerman, Peter S. ;
Hayes, D. Neil ;
Hernandez, Bryan ;
Herman, James G. ;
Heymach, John V. ;
Jurisica, Igor ;
Kucherlapati, Raju ;
Kwiatkowski, David ;
Ladanyi, Marc ;
Robertson, Gordon ;
Schultz, Nikolaus ;
Shen, Ronglai ;
Sinha, Rileen ;
Sougnez, Carrie ;
Tsao, Ming-Sound ;
Travis, William D. ;
Weinstein, John N. ;
Wigle, Dennis A. ;
Wilkerson, Matthew D. ;
Chu, Andy ;
Cherniack, Andrew D. ;
Hadjipanayis, Angela ;
Rosenberg, Mara ;
Weisenberger, Daniel J. ;
Laird, Peter W. ;
Radenbaugh, Amie ;
Ma, Singer ;
Stuart, Joshua M. ;
Byers, Lauren Averett ;
Baylin, Stephen B. ;
Govindan, Ramaswamy ;
Meyerson, Matthew ;
Rosenberg, Mara ;
Gabriel, Stacey B. ;
Cibulskis, Kristian ;
Sougnez, Carrie ;
Kim, Jaegil ;
Stewart, Chip .
NATURE, 2014, 511 (7511) :543-550
[7]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[8]   Analysis of next-generation genomic data in cancer: accomplishments and challenges [J].
Ding, Li ;
Wendl, Michael C. ;
Koboldt, Daniel C. ;
Mardis, Elaine R. .
HUMAN MOLECULAR GENETICS, 2010, 19 :R188-R196
[9]   Novel scripts for improved annotation and selection of variants from whole exome sequencing in cancer research [J].
Hansen, Marcus Celik ;
Nederby, Line ;
Roug, Anne ;
Villesen, Palle ;
Kjeldsen, Eigil ;
Nyvold, Charlotte Guldborg ;
Hokland, Peter .
METHODSX, 2015, 2 :145-153
[10]   Shimmer: detection of genetic alterations in tumors using next-generation sequence data [J].
Hansen, Nancy F. ;
Gartner, Jared J. ;
Mei, Lan ;
Samuels, Yardena ;
Mullikin, James C. .
BIOINFORMATICS, 2013, 29 (12) :1498-1503