Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines

被引:528
作者
Ellrott, Kyle [1 ]
Bailey, Matthew H. [2 ]
Saksena, Gordon [3 ]
Covington, Kyle R. [4 ]
Kandoth, Cyriac [5 ]
Stewart, Chip [3 ]
Hess, Julian [3 ]
Ma, Singer [7 ]
Chiotti, Kami E. [1 ]
McLellan, Michael [2 ]
Sofia, Heidi J. [6 ]
Hutter, Carolyn [6 ]
Getz, Gad [3 ,8 ,9 ,10 ]
Wheeler, David [4 ]
Ding, Li [2 ]
机构
[1] Oregon Hlth & Sci Univ, Biomed Engn, Portland, OR 97239 USA
[2] Washington Univ, Sch Med, Dept Med, McDonnell Genome Inst,Siteman Canc Ctr, St Louis, MO 63110 USA
[3] Eli & Edythe L Broad Inst Massachusetts Inst Tech, Cambridge, MA 02142 USA
[4] Baylor Coll Med, Human Genome Sequencing Ctr, Dept Mol & Human Genet, 1 Baylor Plaza, Houston, TX 77030 USA
[5] Mem Sloan Kettering Canc Ctr, Marie Jose & Henry R Kravis Ctr Mol Oncol, 1275 York Ave, New York, NY 10021 USA
[6] NHGRI, NIH, Bethesda, MD 20892 USA
[7] DNAnexus, 1975 W EL Camino Real,Suite 204, Mountain View, CA 94040 USA
[8] Massachusetts Gen Hosp, Canc Ctr, Boston, MA 02129 USA
[9] Massachusetts Gen Hosp, Dept Pathol, Boston, MA 02129 USA
[10] Harvard Med Sch, Boston, MA 02115 USA
关键词
SOMATIC POINT MUTATIONS; CANCER;
D O I
10.1016/j.cels.2018.03.002
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects.
引用
收藏
页码:271 / +
页数:18
相关论文
共 37 条
  • [1] Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma
    Aguirre, Andrew J.
    Hruban, Ralph H.
    Raphael, Benjamin J.
    [J]. CANCER CELL, 2017, 32 (02) : 185 - +
  • [2] Genomic Classification of Cutaneous Melanoma
    Akbani, Rehan
    Akdemir, Kadir C.
    Aksoy, B. Arman
    Albert, Monique
    Ally, Adrian
    Amin, Samirkumar B.
    Arachchi, Harindra
    Arora, Arshi
    Auman, J. Todd
    Ayala, Brenda
    Baboud, Julien
    Balasundaram, Miruna
    Balu, Saianand
    Barnabas, Nandita
    Bartlett, John
    Bartlett, Pam
    Bastian, Boris C.
    Baylin, Stephen B.
    Behera, Madhusmita
    Belyaev, Dmitry
    Benz, Christopher
    Bernard, Brady
    Beroukhim, Rameen
    Bir, Natalie
    Black, Aaron D.
    Bodenheimer, Tom
    Boice, Lori
    Boland, Genevieve M.
    Bono, Riccardo
    Bootwalla, Moiz S.
    Bosenberg, Marcus
    Bowen, Jay
    Bowlby, Reanne
    Bristow, Christopher A.
    Brockway-Lunardi, Laura
    Brooks, Denise
    Brzezinski, Jakub
    Bshara, Wiam
    Buda, Elizabeth
    Burns, William R.
    Butterfield, Yaron S. N.
    Button, Michael
    Calderone, Tiffany
    Cappellini, Giancarlo Antonini
    Carter, Candace
    Carter, Scott L.
    Cherney, Lynn
    Cherniack, Andrew D.
    Chevalier, Aaron
    Chin, Lynda
    [J]. CELL, 2015, 161 (07) : 1681 - 1696
  • [3] The Ensembl gene annotation system
    Aken, Bronwen L.
    Ayling, Sarah
    Barrell, Daniel
    Clarke, Laura
    Curwen, Valery
    Fairley, Susan
    Banet, Julio Fernandez
    Billis, Konstantinos
    Giron, Carlos Garcia
    Hourlier, Thibaut
    Howe, Kevin
    Kahari, Andreas
    Kokocinski, Felix
    Martin, Fergal J.
    Murphy, Daniel N.
    Nag, Rishi
    Ruffier, Magali
    Schuster, Michael
    Tang, Y. Amy
    Vogel, Jan-Hinnerk
    White, Simon
    Zadissa, Amonida
    Flicek, Paul
    Searle, Stephen M. J.
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
  • [4] [Anonymous], CELL REPORTS
  • [5] [Anonymous], ANN ONCOL
  • [6] [Anonymous], BIORXIV
  • [7] Bailey MH, 2018, CELL, V173, pe318
  • [8] The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity
    Barretina, Jordi
    Caponigro, Giordano
    Stransky, Nicolas
    Venkatesan, Kavitha
    Margolin, Adam A.
    Kim, Sungjoon
    Wilson, Christopher J.
    Lehar, Joseph
    Kryukov, Gregory V.
    Sonkin, Dmitriy
    Reddy, Anupama
    Liu, Manway
    Murray, Lauren
    Berger, Michael F.
    Monahan, John E.
    Morais, Paula
    Meltzer, Jodi
    Korejwa, Adam
    Jane-Valbuena, Judit
    Mapa, Felipa A.
    Thibault, Joseph
    Bric-Furlong, Eva
    Raman, Pichai
    Shipway, Aaron
    Engels, Ingo H.
    Cheng, Jill
    Yu, Guoying K.
    Yu, Jianjun
    Aspesi, Peter, Jr.
    de Silva, Melanie
    Jagtap, Kalpana
    Jones, Michael D.
    Wang, Li
    Hatton, Charles
    Palescandolo, Emanuele
    Gupta, Supriya
    Mahan, Scott
    Sougnez, Carrie
    Onofrio, Robert C.
    Liefeld, Ted
    MacConaill, Laura
    Winckler, Wendy
    Reich, Michael
    Li, Nanxin
    Mesirov, Jill P.
    Gabriel, Stacey B.
    Getz, Gad
    Ardlie, Kristin
    Chan, Vivien
    Myer, Vic E.
    [J]. NATURE, 2012, 483 (7391) : 603 - 607
  • [9] Genomics in childhood acute myeloid leukemia comes of age
    Brunner, Andrew M.
    Graubert, Timothy A.
    [J]. NATURE MEDICINE, 2018, 24 (01) : 7 - 9
  • [10] Initial genome sequencing and analysis of multiple myeloma
    Chapman, Michael A.
    Lawrence, Michael S.
    Keats, Jonathan J.
    Cibulskis, Kristian
    Sougnez, Carrie
    Schinzel, Anna C.
    Harview, Christina L.
    Brunet, Jean-Philippe
    Ahmann, Gregory J.
    Adli, Mazhar
    Anderson, Kenneth C.
    Ardlie, Kristin G.
    Auclair, Daniel
    Baker, Angela
    Bergsagel, P. Leif
    Bernstein, Bradley E.
    Drier, Yotam
    Fonseca, Rafael
    Gabriel, Stacey B.
    Hofmeister, Craig C.
    Jagannath, Sundar
    Jakubowiak, Andrzej J.
    Krishnan, Amrita
    Levy, Joan
    Liefeld, Ted
    Lonial, Sagar
    Mahan, Scott
    Mfuko, Bunmi
    Monti, Stefano
    Perkins, Louise M.
    Onofrio, Robb
    Pugh, Trevor J.
    Rajkumar, S. Vincent
    Ramos, Alex H.
    Siegel, David S.
    Sivachenko, Andrey
    Stewart, A. Keith
    Trudel, Suzanne
    Vij, Ravi
    Voet, Douglas
    Winckler, Wendy
    Zimmerman, Todd
    Carpten, John
    Trent, Jeff
    Hahn, William C.
    Garraway, Levi A.
    Meyerson, Matthew
    Lander, Eric S.
    Getz, Gad
    Golub, Todd R.
    [J]. NATURE, 2011, 471 (7339) : 467 - 472