Data Lakes, Clouds, and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data

被引:37
作者
Grossman, Robert L. [1 ]
机构
[1] Univ Chicago, Ctr Translat Data Sci, 900 East 57th St,KCBD 10142, Chicago, IL 60637 USA
关键词
CANCER; VISION; BROWSER; GALAXY;
D O I
10.1016/j.tig.2018.12.006
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Data commons collate data with cloud computing infrastructure and commonly used software services, tools, and applications to create biomedical resources for the large-scale management, analysis, harmonization, and sharing of biomedical data. Over the past few years, data commons have been used to analyze, harmonize, and share large-scale genomics datasets. Data ecosystems can be built by interoperating multiple data commons. It can be quite labor intensive to curate, import, and analyze the data in a data commons. Data lakes provide an alternative to data commons and simply provide access to data, with the data curation and analysis deferred until later and delegated to those that access the data. We review software platforms for managing, analyzing, and sharing genomic data, with an emphasis on data commons, but also cover data ecosystems and data lakes.
引用
收藏
页码:223 / 234
页数:12
相关论文
共 62 条
[31]   The NCI Genomic Data Commons as an engine for precision medicine [J].
Jensen, Mark A. ;
Ferretti, Vincent ;
Grossman, Robert L. ;
Staudt, Louis M. .
BLOOD, 2017, 130 (04) :453-459
[32]   The human genome browser at UCSC [J].
Kent, WJ ;
Sugnet, CW ;
Furey, TS ;
Roskin, KM ;
Pringle, TH ;
Zahler, AM ;
Haussler, D .
GENOME RESEARCH, 2002, 12 (06) :996-1006
[33]  
Knoppers Bartha Maria, 2014, Hugo J, V8, P3, DOI 10.1186/s11568-014-0003-1
[34]   The Human Phenotype Ontology in 2017 [J].
Koehler, Sebastian ;
Vasilevsky, Nicole A. ;
Engelstad, Mark ;
Foster, Erin ;
McMurry, Julie ;
Ayme, Segolene ;
Baynam, Gareth ;
Bello, Susan M. ;
Boerkoel, Cornelius F. ;
Boycott, Kym M. ;
Brudno, Michael ;
Buske, Orion J. ;
Chinnery, Patrick F. ;
Cipriani, Valentina ;
Connell, Laureen E. ;
Dawkins, Hugh J. S. ;
DeMare, Laura E. ;
Devereau, Andrew D. ;
de Vries, Bert B. A. ;
Firth, Helen V. ;
Freson, Kathleen ;
Greene, Daniel ;
Hamosh, Ada ;
Helbig, Ingo ;
Hum, Courtney ;
Jahn, Johaenna A. ;
James, Roger ;
Krause, Roland ;
Laulederkind, Stanley J. F. ;
Lochmuller, Hanns ;
Lyon, Gholson J. ;
Ogishima, Soichi ;
Olry, Annie ;
Ouwehand, Willem H. ;
Pontikos, Nikolas ;
Rath, Ana ;
Schaefer, Franz ;
Scott, Richard H. ;
Segal, Michael ;
Sergouniotis, Panagiotis I. ;
Sever, Richard ;
Smith, Cynthia L. ;
Straub, Volker ;
Thompson, Rachel ;
Turner, Catherine ;
Turro, Ernest ;
Veltman, Marijcke W. M. ;
Vulliamy, Tom ;
Yu, Jing ;
von Ziegenweidt, Julie .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D865-D876
[35]   The Cancer Genomics Cloud: Collaborative, Reproducible, and Democratized-A New Paradigm in Large-Scale Computational Research [J].
Lau, Jessica W. ;
Lehnert, Erik ;
Sethi, Anurag ;
Malhotra, Raunaq ;
Kaushik, Gaurav ;
Onder, Zeynep ;
Groves-Kirkby, Nick ;
Mihajlovic, Aleksandar ;
DiGiovanna, Jack ;
Srdic, Mladen ;
Bajcic, Dragan ;
Radenkovic, Jelena ;
Mladenovic, Vladimir ;
Krstanovic, Damir ;
Arsenijevic, Vladan ;
Klisic, Djordje ;
Mitrovic, Milan ;
Bogicevic, Igor ;
Kural, Deniz ;
Davis-Dusenbery, Brandi .
CANCER RESEARCH, 2017, 77 (21) :E3-E6
[36]   Mutational heterogeneity in cancer and the search for new cancer-associated genes [J].
Lawrence, Michael S. ;
Stojanov, Petar ;
Polak, Paz ;
Kryukov, Gregory V. ;
Cibulskis, Kristian ;
Sivachenko, Andrey ;
Carter, Scott L. ;
Stewart, Chip ;
Mermel, Craig H. ;
Roberts, Steven A. ;
Kiezun, Adam ;
Hammerman, Peter S. ;
McKenna, Aaron ;
Drier, Yotam ;
Zou, Lihua ;
Ramos, Alex H. ;
Pugh, Trevor J. ;
Stransky, Nicolas ;
Helman, Elena ;
Kim, Jaegil ;
Sougnez, Carrie ;
Ambrogio, Lauren ;
Nickerson, Elizabeth ;
Shefler, Erica ;
Cortes, Maria L. ;
Auclair, Daniel ;
Saksena, Gordon ;
Voet, Douglas ;
Noble, Michael ;
DiCara, Daniel ;
Lin, Pei ;
Lichtenstein, Lee ;
Heiman, David I. ;
Fennell, Timothy ;
Imielinski, Marcin ;
Hernandez, Bryan ;
Hodis, Eran ;
Baca, Sylvan ;
Dulak, Austin M. ;
Lohr, Jens ;
Landau, Dan-Avi ;
Wu, Catherine J. ;
Melendez-Zajgla, Jorge ;
Hidalgo-Miranda, Alfredo ;
Koren, Amnon ;
McCarroll, Steven A. ;
Mora, Jaume ;
Lee, Ryan S. ;
Crompton, Brian ;
Onofrio, Robert .
NATURE, 2013, 499 (7457) :214-218
[37]   Data Harmonization for a Molecularly Driven Health System [J].
Lee, Jerry Ssu-Hsien ;
Kibbe, Warren Alden ;
Grossman, Robert Lee .
CELL, 2018, 174 (05) :1045-1048
[38]   Tackling the widespread and critical impact of batch effects in high-throughput data [J].
Leek, Jeffrey T. ;
Scharpf, Robert B. ;
Bravo, Hector Corrada ;
Simcha, David ;
Langmead, Benjamin ;
Johnson, W. Evan ;
Geman, Donald ;
Baggerly, Keith ;
Irizarry, Rafael A. .
NATURE REVIEWS GENETICS, 2010, 11 (10) :733-739
[39]   A review of bioinformatic pipeline frameworks [J].
Leipzig, Jeremy .
BRIEFINGS IN BIOINFORMATICS, 2017, 18 (03) :530-536
[40]   Experiences building Globus Genomics: a next-generation sequencing analysis service using Galaxy, Globus, and Amazon Web Services [J].
Madduri, Ravi K. ;
Sulakhe, Dinanath ;
Lacinski, Lukasz ;
Liu, Bo ;
Rodriguez, Alex ;
Chard, Kyle ;
Dave, Utpal J. ;
Foster, Ian T. .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2014, 26 (13) :2266-2279