TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data

被引:2426
作者
Colaprico, Antonio [1 ,2 ]
Silva, Tiago C. [3 ,4 ]
Olsen, Catharina [1 ,2 ]
Garofano, Luciano [5 ,6 ]
Cava, Claudia [7 ]
Garolini, Davide [8 ]
Sabedot, Thais S. [3 ,4 ]
Malta, Tathiane M. [3 ,4 ]
Pagnotta, Stefano M. [5 ,9 ]
Castiglioni, Isabella
Ceccarelli, Michele [10 ]
Bontempi, Gianluca [1 ,2 ]
Noushmehr, Houtan [3 ,4 ]
机构
[1] Interuniv Inst Bioinformat Brussels, Brussels, Belgium
[2] Univ Libre Bruxelles, Dept Informat, Machine Learning Grp, Brussels, Belgium
[3] Univ Sao Paulo, Ribeirao Preto Med Sch, Dept Genet, Sao Paulo, Brazil
[4] NAP USP, Ctr Integrat Syst Biol CISBi, Sao Paulo, Brazil
[5] Univ Sannio, Dept Sci & Technol, Benevento, Italy
[6] Unltd Software Srl, Naples, Italy
[7] Natl Res Council IBFM CNR, Inst Mol Bioimaging & Physiol, Milan, Italy
[8] Univ Turin, Dept Phys, Phys Complex Syst, I-10124 Turin, Italy
[9] BIOGEM, Bioinformat Lab, Avellino, Italy
[10] HBKU, Qatar Comp Res Inst, Doha, Qatar
基金
巴西圣保罗研究基金会;
关键词
SOMATIC GENOMIC LANDSCAPE; CANCER GENOMICS; BIOCONDUCTOR; SOFTWARE;
D O I
10.1093/nar/gkv1507
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's research network, opportunities still exist to implement novel methods, thereby elucidating new biological pathways and diagnostic markers. However, mining the TCGA data presents several bioinformatics challenges, such as data retrieval and integration with clinical data and other molecular data types (e.g. RNA and DNA methylation). We developed an R/Bioconductor package called TCGAbiolinks to address these challenges and offer bioinformatics solutions by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies and in our own group. Using four different TCGA tumor types (Kidney, Brain, Breast and Colon) as examples, we provide case studies to illustrate examples of reproducibility, integrative analysis and utilization of different Bioconductor packages to advance and accelerate novel discoveries.
引用
收藏
页数:11
相关论文
共 36 条
  • [1] Integrated Genomic Characterization of Papillary Thyroid Carcinoma
    Agrawal, Nishant
    Akbani, Rehan
    Aksoy, B. Arman
    Ally, Adrian
    Arachchi, Harindra
    Asa, Sylvia L.
    Auman, J. Todd
    Balasundaram, Miruna
    Balu, Saianand
    Baylin, Stephen B.
    Behera, Madhusmita
    Bernard, Brady
    Beroukhim, Rameen
    Bishop, Justin A.
    Black, Aaron D.
    Bodenheimer, Tom
    Boice, Lori
    Bootwalla, Moiz S.
    Bowen, Jay
    Bowlby, Reanne
    Bristow, Christopher A.
    Brookens, Robin
    Brooks, Denise
    Bryant, Robert
    Buda, Elizabeth
    Butterfield, Yaron S. N.
    Carling, Tobias
    Carlsen, Rebecca
    Carter, Scott L.
    Carty, Sally E.
    Chan, Timothy A.
    Chen, Amy Y.
    Cherniack, Andrew D.
    Cheung, Dorothy
    Chin, Lynda
    Cho, Juok
    Chu, Andy
    Chuah, Eric
    Cibulskis, Kristian
    Ciriello, Giovanni
    Clarke, Amanda
    Clayman, Gary L.
    Cope, Leslie
    Copland, John A.
    Covington, Kyle
    Danilova, Ludmila
    Davidsen, Tanja
    Demchok, John A.
    DiCara, Daniel
    Dhalla, Noreen
    [J]. CELL, 2014, 159 (03) : 676 - 690
  • [2] Credit for code
    不详
    [J]. NATURE GENETICS, 2014, 46 (01) : 1 - 1
  • [3] Comprehensive molecular characterization of gastric adenocarcinoma
    Bass, Adam J.
    Thorsson, Vesteinn
    Shmulevich, Ilya
    Reynolds, Sheila M.
    Miller, Michael
    Bernard, Brady
    Hinoue, Toshinori
    Laird, Peter W.
    Curtis, Christina
    Shen, Hui
    Weisenberger, Daniel J.
    Schultz, Nikolaus
    Shen, Ronglai
    Weinhold, Nils
    Keiser, David P.
    Bowlby, Reanne
    Sipahimalani, Payal
    Cherniack, Andrew D.
    Getz, Gad
    Liu, Yingchun
    Noble, Michael S.
    Pedamallu, Chandra
    Sougnez, Carrie
    Taylor-Weiner, Amaro
    Akbani, Rehan
    Lee, Ju-Seog
    Liu, Wenbin
    Mills, Gordon B.
    Yang, Da
    Zhang, Wei
    Pantazi, Angeliki
    Parfenov, Michael
    Gulley, Margaret
    Piazuelo, M. Blanca
    Schneider, Barbara G.
    Kim, Jihun
    Boussioutas, Alex
    Sheth, Margi
    Demchok, John A.
    Rabkin, Charles S.
    Willis, Joseph E.
    Ng, Sam
    Garman, Katherine
    Beer, David G.
    Pennathur, Arjun
    Raphael, Benjamin J.
    Wu, Hsin-Ta
    Odze, Robert
    Kim, Hark K.
    Bowen, Jay
    [J]. NATURE, 2014, 513 (7517) : 202 - 209
  • [4] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [5] The NIH Roadmap Epigenomics Mapping Consortium
    Bernstein, Bradley E.
    Stamatoyannopoulos, John A.
    Costello, Joseph F.
    Ren, Bing
    Milosavljevic, Aleksandar
    Meissner, Alexander
    Kellis, Manolis
    Marra, Marco A.
    Beaudet, Arthur L.
    Ecker, Joseph R.
    Farnham, Peggy J.
    Hirst, Martin
    Lander, Eric S.
    Mikkelsen, Tarjei S.
    Thomson, James A.
    [J]. NATURE BIOTECHNOLOGY, 2010, 28 (10) : 1045 - 1048
  • [6] Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas
    Brat, Daniel J.
    Verhaak, Roel G. W.
    Al-dape, Kenneth D.
    Yung, W. K. Alfred
    Salama, Sofie R.
    Cooper, Lee A. D.
    Rheinbay, Esther
    Miller, C. Ryan
    Vitucci, Mark
    Morozova, Olena
    Robertson, A. Gordon
    Noushmehr, Houtan
    Laird, Peter W.
    Cherniack, Andrew D.
    Akbani, Rehan
    Huse, Jason T.
    Ciriello, Giovanni
    Poisson, Laila M.
    Barnholtz-Sloan, Jill S.
    Berger, Mitchel S.
    Brennan, Cameron
    Colen, Rivka R.
    Colman, Howard
    Flanders, Adam E.
    Giannini, Caterina
    Grifford, Mia
    Iavarone, Antonio
    Jain, Rajan
    Joseph, Isaac
    Kim, Jaegil
    Kasaian, Katayoon
    Mikkelsen, Tom
    Murray, Bradley A.
    O'Neill, Brian Patrick
    Pachter, Lior
    Parsons, Donald W.
    Sougnez, Carrie
    Sulman, Erik P.
    Vandenberg, Scott R.
    Van Meir, Erwin G.
    von Deimling, Andreas
    Zhang, Hailei
    Crain, Daniel
    Lau, Kevin
    Mallery, David
    Morris, Scott
    Paulauskis, Joseph
    Penny, Robert
    Shelton, Troy
    Sherman, Mark
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2015, 372 (26) : 2481 - 2498
  • [7] The Somatic Genomic Landscape of Glioblastoma
    Brennan, Cameron W.
    Verhaak, Roel G. W.
    McKenna, Aaron
    Campos, Benito
    Noushmehr, Houtan
    Salama, Sofie R.
    Zheng, Siyuan
    Chakravarty, Debyani
    Sanborn, J. Zachary
    Berman, Samuel H.
    Beroukhim, Rameen
    Bernard, Brady
    Wu, Chang-Jiun
    Genovese, Giannicola
    Shmulevich, Ilya
    Barnholtz-Sloan, Jill
    Zou, Lihua
    Vegesna, Rahulsimham
    Shukla, Sachet A.
    Ciriello, Giovanni
    Yung, W. K.
    Zhang, Wei
    Sougnez, Carrie
    Mikkelsen, Tom
    Aldape, Kenneth
    Bigner, Darell D.
    Van Meir, Erwin G.
    Prados, Michael
    Sloan, Andrew
    Black, Keith L.
    Eschbacher, Jennifer
    Finocchiaro, Gaetano
    Friedman, William
    Andrews, David W.
    Guha, Abhijit
    Iacocca, Mary
    O'Neill, Brian P.
    Foltz, Greg
    Myers, Jerome
    Weisenberger, Daniel J.
    Penny, Robert
    Kucherlapati, Raju
    Perou, Charles M.
    Hayes, D. Neil
    Gibbs, Richard
    Marra, Marco
    Mills, Gordon B.
    Lander, Eric
    Spellman, Paul
    Wilson, Richard
    [J]. CELL, 2013, 155 (02) : 462 - 477
  • [8] Ceccarelli M., 2016, CELL IN PRESS, V164
  • [9] Comprehensive genomic characterization defines human glioblastoma genes and core pathways
    Chin, L.
    Meyerson, M.
    Aldape, K.
    Bigner, D.
    Mikkelsen, T.
    VandenBerg, S.
    Kahn, A.
    Penny, R.
    Ferguson, M. L.
    Gerhard, D. S.
    Getz, G.
    Brennan, C.
    Taylor, B. S.
    Winckler, W.
    Park, P.
    Ladanyi, M.
    Hoadley, K. A.
    Verhaak, R. G. W.
    Hayes, D. N.
    Spellman, Paul T.
    Absher, D.
    Weir, B. A.
    Ding, L.
    Wheeler, D.
    Lawrence, M. S.
    Cibulskis, K.
    Mardis, E.
    Zhang, Jinghui
    Wilson, R. K.
    Donehower, L.
    Wheeler, D. A.
    Purdom, E.
    Wallis, J.
    Laird, P. W.
    Herman, J. G.
    Schuebel, K. E.
    Weisenberger, D. J.
    Baylin, S. B.
    Schultz, N.
    Yao, Jun
    Wiedemeyer, R.
    Weinstein, J.
    Sander, C.
    Gibbs, R. A.
    Gray, J.
    Kucherlapati, R.
    Lander, E. S.
    Myers, R. M.
    Perou, C. M.
    McLendon, Roger
    [J]. NATURE, 2008, 455 (7216) : 1061 - 1068
  • [10] Making sense of cancer genomic data
    Chin, Lynda
    Hahn, William C.
    Getz, Gad
    Meyerson, Matthew
    [J]. GENES & DEVELOPMENT, 2011, 25 (06) : 534 - 555