NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer

被引:31
作者
Anzar, Irantzu [1 ]
Sverchkova, Angelina [1 ]
Stratford, Richard [1 ]
Clancy, Trevor [1 ]
机构
[1] OncoImmunity AS, Oslo Canc Cluster, Ullernchausseen 64-66, N-0379 Oslo, Norway
关键词
Somatic variant detection; Machine learning; Cancer genomics; Precision medicine; POINT MUTATIONS; IDENTIFICATION; ALGORITHMS; DISCOVERY; VARIANTS; PIPELINE;
D O I
10.1186/s12920-019-0508-5
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
BackgroundThe accurate screening of tumor genomic landscapes for somatic mutations using high-throughput sequencing involves a crucial step in precise clinical diagnosis and targeted therapy. However, the complex inherent features of cancer tissue, especially, tumor genetic intra-heterogeneity coupled with the problem of sequencing and alignment artifacts, makes somatic variant calling a challenging task. Current variant filtering strategies, such as rule-based filtering and consensus voting of different algorithms, have previously helped to increase specificity, although comes at the cost of sensitivity.MethodsIn light of this, we have developed the NeoMutate framework which incorporates 7 supervised machine learning (ML) algorithms to exploit the strengths of multiple variant callers, using a non-redundant set of biological and sequence features. We benchmarked NeoMutate by simulating more than 10,000 bona fide cancer-related mutations into three well-characterized Genome in a Bottle (GIAB) reference samples.ResultsA robust and exhaustive evaluation of NeoMutate's performance based on 5-fold cross validation experiments, in addition to 3 independent tests, demonstrated a substantially improved variant detection accuracy compared to any of its individual composite variant callers and consensus calling of multiple tools.ConclusionsWe show here that integrating multiple tools in an ensemble ML layer optimizes somatic variant detection rates, leading to a potentially improved variant selection framework for the diagnosis and treatment of cancer.
引用
收藏
页数:14
相关论文
共 44 条
  • [1] A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing
    Alioto, Tyler S.
    Buchhalter, Ivo
    Derdak, Sophia
    Hutter, Barbara
    Eldridge, Matthew D.
    Hovig, Eivind
    Heisler, Lawrence E.
    Beck, Timothy A.
    Simpson, Jared T.
    Tonon, Laurie
    Sertier, Anne-Sophie
    Patch, Ann-Marie
    Jaeger, Natalie
    Ginsbach, Philip
    Drews, Ruben
    Paramasivam, Nagarajan
    Kabbe, Rolf
    Chotewutmontri, Sasithorn
    Diessl, Nicolle
    Previti, Christopher
    Schmidt, Sabine
    Brors, Benedikt
    Feuerbach, Lars
    Heinold, Michael
    Groebner, Susanne
    Korshunov, Andrey
    Tarpey, Patrick S.
    Butler, Adam P.
    Hinton, Jonathan
    Jones, David
    Menzies, Andrew
    Raine, Keiran
    Shepherd, Rebecca
    Stebbings, Lucy
    Teague, Jon W.
    Ribeca, Paolo
    Giner, Francesc Castro
    Beltran, Sergi
    Raineri, Emanuele
    Dabad, Marc
    Heath, Simon C.
    Gut, Marta
    Denroche, Robert E.
    Harding, Nicholas J.
    Yamaguchi, Takafumi N.
    Fujimoto, Akihiro
    Nakagawa, Hidewaki
    Quesada, Ctor
    Valdes-Mas, Rafael
    Nakken, Sigve
    [J]. NATURE COMMUNICATIONS, 2015, 6
  • [2] The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website
    Bamford, S
    Dawson, E
    Forbes, S
    Clements, J
    Pettett, R
    Dogan, A
    Flanagan, A
    Teague, J
    Futreal, PA
    Stratton, MR
    Wooster, R
    [J]. BRITISH JOURNAL OF CANCER, 2004, 91 (02) : 355 - 358
  • [3] Comprehensive benchmarking of SNV callers for highly admixed tumor data
    Bohnert, Regina
    Vivas, Sonia
    Jansen, Gunther
    [J]. PLOS ONE, 2017, 12 (10):
  • [4] BBMerge - Accurate paired shotgun read merging via overlap
    Bushnell, Brian
    Rood, Jonathan
    Singer, Esther
    [J]. PLOS ONE, 2017, 12 (10):
  • [5] Cai L, 2016, SCI REP, V6
  • [6] BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity
    Cantarel, Brandi L.
    Weaver, Daniel
    McNeill, Nathan
    Zhang, Jianhua
    Mackey, Aaron J.
    Reese, Justin
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [7] CoVaCS: a consensus variant calling system
    Chiara, Matteo
    Gioiosa, Silvia
    Chillemi, Giovanni
    D'Antonio, Mattia
    Flati, Tiziano
    Picardi, Ernesto
    Zambelli, Federico
    Horner, David Stephen
    Pesole, Graziano
    Castrignano, Tiziana
    [J]. BMC GENOMICS, 2018, 19
  • [8] Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples
    Cibulskis, Kristian
    Lawrence, Michael S.
    Carter, Scott L.
    Sivachenko, Andrey
    Jaffe, David
    Sougnez, Carrie
    Gabriel, Stacey
    Meyerson, Matthew
    Lander, Eric S.
    Getz, Gad
    [J]. NATURE BIOTECHNOLOGY, 2013, 31 (03) : 213 - 219
  • [9] The Genetic Basis for Cancer Treatment Decisions
    Dancey, Janet E.
    Bedard, Philippe L.
    Onetto, Nicole
    Hudson, Thomas J.
    [J]. CELL, 2012, 148 (03) : 409 - 420
  • [10] A framework for variation discovery and genotyping using next-generation DNA sequencing data
    DePristo, Mark A.
    Banks, Eric
    Poplin, Ryan
    Garimella, Kiran V.
    Maguire, Jared R.
    Hartl, Christopher
    Philippakis, Anthony A.
    del Angel, Guillermo
    Rivas, Manuel A.
    Hanna, Matt
    McKenna, Aaron
    Fennell, Tim J.
    Kernytsky, Andrew M.
    Sivachenko, Andrey Y.
    Cibulskis, Kristian
    Gabriel, Stacey B.
    Altshuler, David
    Daly, Mark J.
    [J]. NATURE GENETICS, 2011, 43 (05) : 491 - +