Data Curation with a Focus on Reuse

被引:2
作者
Esteva, Maria [1 ]
Sweat, Sandra [2 ]
Mclay, Robert [1 ]
Xu, Weijia [1 ]
Kulasekaran, Sivakumar [1 ]
机构
[1] Texas Adv Comp Ctr, Austin, TX 78758 USA
[2] Univ Texas Austin, Austin, TX 78712 USA
来源
2016 IEEE/ACM JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL) | 2016年
关键词
Data curation; high performance computing; distributed collections architecture; data publishing and reuse;
D O I
10.1145/2910896.2910906
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A dataset from the field of High Performance Computing (HPC) was curated with the focus on facilitating its reuse and to appeal to a broader audience beyond HPC specialists. At an early stage in the research project, the curators gathered requirements from prospective users of the dataset, focusing on how and for which research projects they would reuse the data. Users needs informed which curation tasks to conduct, which included: adding more information elements to the dataset to expand its content scope; removing personal information; and, packaging the data in a size, a format, and at a frequency of delivery that are convenient for access and analysis purposes. The curation tasks are embedded in the software that produces the data, and are implemented as an automated workflow that spans various HPC resources, in which the dataset is generated, processed and stored and the Texas ScholarWorks institutional repository, through which the data is published. Within this distributed architecture, the integrated data creation and curation workflow complies with long-term preservation requirements, and is the first one implemented as a collaboration between the supercomputing center where the data is created on ongoing basis, and the University Libraries at UT Austin where it is published. The targeted curation strategy included the design of proof of concept data analyses to evaluate if the curated data met the reuse scenarios proposed by users. The results suggest that the dataset is understandable, and that researchers can use it to answer some of the research questions they posed. Results also pointed to specific elements of the curation strategy that had to be improved and disclosed the difficulties involved in breaking data to new users.
引用
收藏
页码:45 / 54
页数:10
相关论文
共 50 条
  • [31] Data curation in the Internet of Things: A decision model approach
    Jose de Haro-Olmo, Francisco
    Valencia-Parra, Alvaro
    Jesus Varela-Vaca, Angel
    Antonio Alvarez-Bermejo, Jose
    COMPUTATIONAL AND MATHEMATICAL METHODS, 2021, 3 (06)
  • [32] ELI: an IoT-aware big data pipeline with data curation and data quality
    de Haro-Olmo F.J.
    Valencia-Parra A.
    Varela-Vaca Á.J.
    Álvarez-Bermejo J.A.
    Gómez-López M.T.
    PeerJ Computer Science, 2023, 9 : 1 - 24
  • [33] On the importance of data curation for knowledge mining in antiviral research
    Martin, Holli-Joi
    Melo-Filho, Cleber C.
    Zakharov, Alexey V.
    Muratov, Eugene
    Tropsha, Alexander
    SCIENCE PROGRESS, 2025, 108 (01)
  • [34] AutoCure: Automated Tabular Data Curation for ML Pipelines
    Abdelaal, Mohamed
    Koparde, Rashmi
    Schoening, Harald
    PROCEEDINGS OF THE SIXTH INTERNATIONAL WORKSHOP ON EXPLOITING ARTIFICIAL INTELLIGENCE TECHNIQUES FOR DATA MANAGEMENT, AIDM 2023, 2023,
  • [35] A Study on Library Service Innovation Based on Data Curation
    Jing, Zhang
    Fang, Zhao
    2017 13TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG 2017), 2017, : 213 - 218
  • [36] With Registered Reports Towards Large Scale Data Curation
    Herbold, Steffen
    2020 IEEE/ACM 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: NEW IDEAS AND EMERGING RESULTS (ICSE-NIER 2020), 2020, : 93 - 96
  • [37] Data Curation for Preclinical and Clinical Multimodal Imaging Studies
    Yamoah, Grace Gyamfuah
    Cao, Liji
    Wu, Chao Wu
    Beekman, Freek J.
    Vandeghinste, Bert
    Mannheim, Julia G.
    Rosenhain, Stefanie
    Leonardic, Kevin
    Kiessling, Fabian
    Gremse, Felix
    MOLECULAR IMAGING AND BIOLOGY, 2019, 21 (06) : 1034 - 1043
  • [38] Merging data curation and machine learning to improve nanomedicines
    Chen, Chen
    Yaari, Zvi
    Apfelbaum, Elana
    Grodzinski, Piotr
    Shamay, Yosi
    Heller, Daniel A.
    ADVANCED DRUG DELIVERY REVIEWS, 2022, 183
  • [39] Data Curation for Preclinical and Clinical Multimodal Imaging Studies
    Grace Gyamfuah Yamoah
    Liji Cao
    Chao Wu Wu
    Freek J. Beekman
    Bert Vandeghinste
    Julia G. Mannheim
    Stefanie Rosenhain
    Kevin Leonardic
    Fabian Kiessling
    Felix Gremse
    Molecular Imaging and Biology, 2019, 21 : 1034 - 1043
  • [40] Research Center Insights into Data Curation Education and Curriculum
    Mayernik, Matthew S.
    Davis, Lynne
    Kelly, Karon
    Dattore, Bob
    Strand, Gary
    Worley, Steven J.
    Marlino, Mary
    THEORY AND PRACTICE OF DIGITAL LIBRARIES - TPDL 2013 SELECTED WORKSHOPS, 2014, 416 : 239 - 248