A Case for Data Commons: Toward Data Science as a Service

被引:47
作者
Grossman, Robert L. [1 ,2 ,3 ]
Heath, Allison [4 ]
Murphy, Mark [1 ]
Patterson, Maria [1 ]
Wells, Walt [5 ]
机构
[1] Univ Chicago, Ctr Data Intens Sci, Chicago, IL 60637 USA
[2] Univ Chicago, Div Biol Sci, Chicago, IL 60637 USA
[3] Univ Chicago, Computat Inst, Chicago, IL 60637 USA
[4] Univ Chicago, Ctr Data Intens Sci, Res, Chicago, IL 60637 USA
[5] Ctr Computat Sci Res, New York, NY USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
cloud computing; data as a service; data commons; science as a service; scientific computing; software as services;
D O I
10.1109/MCSE.2016.92
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data commons collocate data, storage, and computing infrastructure with core services and commonly used tools and applications for managing, analyzing, and sharing data to create an interoperable resource for the research community. An architecture for data commons is described, as well as some lessons learned from operating several large-scale data commons.
引用
收藏
页码:10 / 20
页数:11
相关论文
共 50 条
  • [21] Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing
    Mendez, Kevin M.
    Pritchard, Leighton
    Reinke, Stacey N.
    Broadhurst, David I.
    METABOLOMICS, 2019, 15 (10)
  • [22] The National Sleep Research Resource: towards a sleep data commons
    Zhang, Guo-Qiang
    Cui, Licong
    Mueller, Remo
    Tao, Shiqiang
    Kim, Matthew
    Rueschman, Michael
    Mariani, Sara
    Mobley, Daniel
    Redline, Susan
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2018, 25 (10) : 1351 - 1358
  • [23] The National Sleep Research Resource: Towards a Sleep Data Commons
    Zhang, Guo-Qiang
    Cui, Licong
    Mueller, Remo
    Tao, Shiqiang
    Kim, Matthew
    Rueschman, Michael
    Mariani, Sara
    Mobley, Daniel
    Redline, Susan
    ACM-BCB'18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2018, : 572 - 572
  • [25] Application of Data Science for Controlling Energy Crises: A Case Study of Pakistan
    Ullah, Saif
    Asif, Muhammad
    Ahmad, Shahbaz
    Imdad, Ulfat
    Sohaib, Osama
    2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE AND COMPUTER APPLICATIONS (ICSCA 2019), 2019, : 60 - 64
  • [26] Scaling Data Science Solutions with Semantics and Machine Learning: Bosch Case
    Zhou, Baifan
    Nikolov, Nikolay
    Zheng, Zhuoxun
    Luo, Xianghui
    Savkovic, Ognjen
    Roman, Dumitru
    Soylu, Ahmet
    Kharlamov, Evgeny
    SEMANTIC WEB, ISWC 2023, PT II, 2023, 14266 : 380 - 399
  • [27] Toward a Global Data Infrastructure
    Mor, Nitesh
    Zhang, Ben
    Kolb, John
    Chan, Douglas S.
    Goyal, Nikhil
    Sun, Nicholas
    Lutz, Ken
    Allman, Eric
    Wawrzynek, John
    Lee, Edward A.
    Kubiatowicz, John
    IEEE INTERNET COMPUTING, 2016, 20 (03) : 54 - 62
  • [28] The Veterans Precision Oncology Data Commons: Transforming VA data into a national resource for research in precision oncology
    Do, Nhan
    Grossman, Robert
    Feldman, Theodore
    Fillmore, Nathanael
    Elbers, Danne
    Tuck, David
    Dhond, Rupali
    Selva, Luis
    Meng, Frank
    Fitzsimons, Michael
    Ajjarapu, Samuel
    Ayandeh, Siamack
    Hall, Robert
    Do, Stephanie
    Brophy, Mary
    SEMINARS IN ONCOLOGY, 2019, 46 (4-5) : 314 - 320
  • [29] How to Improve Research Data Management The Case of Sciebo (Science Box)
    Wilms, Konstantin
    Meske, Christian
    Stieglitz, Stefan
    Rudolph, Dominik
    Vogl, Raimund
    HUMAN INTERFACE AND THE MANAGEMENT OF INFORMATION: APPLICATIONS AND SERVICES, PT II, 2016, 9735 : 434 - 442
  • [30] Data Value Chain as a Service Framework: for Enabling Data Handling, Data Security and Data Analysis in the Cloud
    Kasim, Henry
    Hung, Terence
    Li, Xiaorong
    PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 804 - 809