Data Curation with a Focus on Reuse

被引:2
|
作者
Esteva, Maria [1 ]
Sweat, Sandra [2 ]
Mclay, Robert [1 ]
Xu, Weijia [1 ]
Kulasekaran, Sivakumar [1 ]
机构
[1] Texas Adv Comp Ctr, Austin, TX 78758 USA
[2] Univ Texas Austin, Austin, TX 78712 USA
来源
2016 IEEE/ACM JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL) | 2016年
关键词
Data curation; high performance computing; distributed collections architecture; data publishing and reuse;
D O I
10.1145/2910896.2910906
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A dataset from the field of High Performance Computing (HPC) was curated with the focus on facilitating its reuse and to appeal to a broader audience beyond HPC specialists. At an early stage in the research project, the curators gathered requirements from prospective users of the dataset, focusing on how and for which research projects they would reuse the data. Users needs informed which curation tasks to conduct, which included: adding more information elements to the dataset to expand its content scope; removing personal information; and, packaging the data in a size, a format, and at a frequency of delivery that are convenient for access and analysis purposes. The curation tasks are embedded in the software that produces the data, and are implemented as an automated workflow that spans various HPC resources, in which the dataset is generated, processed and stored and the Texas ScholarWorks institutional repository, through which the data is published. Within this distributed architecture, the integrated data creation and curation workflow complies with long-term preservation requirements, and is the first one implemented as a collaboration between the supercomputing center where the data is created on ongoing basis, and the University Libraries at UT Austin where it is published. The targeted curation strategy included the design of proof of concept data analyses to evaluate if the curated data met the reuse scenarios proposed by users. The results suggest that the dataset is understandable, and that researchers can use it to answer some of the research questions they posed. Results also pointed to specific elements of the curation strategy that had to be improved and disclosed the difficulties involved in breaking data to new users.
引用
收藏
页码:45 / 54
页数:10
相关论文
共 50 条
  • [1] Knowledge Representation of Social Science Research Data for Data Curation and Reuse
    Sun, Guangyuan
    Khoo, Christopher S. G.
    DIGITAL LIBRARIES: PROVIDING QUALITY INFORMATION, 2015, 9469 : 358 - 359
  • [2] Big data curation framework: Curation actions and challenges
    Yoon, Ayoung
    Kim, Jihyun
    Donaldson, Devan Ray
    JOURNAL OF INFORMATION SCIENCE, 2025, 51 (01) : 205 - 223
  • [3] Data Curation Is for Everyone! The Case for Master's and Baccalaureate Institutional Engagement with Data Curation
    Shorish, Yasmeen
    JOURNAL OF WEB LIBRARIANSHIP, 2012, 6 (04) : 263 - 273
  • [4] Curation of Digital Scientific Data
    A. O. Erkimbaev
    V. Yu. Zitserman
    G. A. Kobzev
    A. V. Kosinov
    Scientific and Technical Information Processing, 2019, 46 : 192 - 203
  • [5] A Reflection on a Data Curation Journey
    Loetter, Lucia
    van Zyl, Christa
    JOURNAL OF EMPIRICAL RESEARCH ON HUMAN RESEARCH ETHICS, 2015, 10 (03) : 338 - 343
  • [6] Anatomy of Metadata for Data Curation
    Visengeriyeva, Larysa
    Abedjan, Ziawasch
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2020, 12 (03):
  • [7] The Dacura Data Curation System
    Feeney, Kevin
    COMPUTATIONAL HISTORY AND DATA-DRIVEN HUMANITIES, CHDDH 2016, 2016, 482 : 15 - 20
  • [8] Curation of Digital Scientific Data
    Erkimbaev, A. O.
    Zitserman, V. Yu.
    Kobzev, G. A.
    Kosinov, A. V.
    SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING, 2019, 46 (03) : 192 - 203
  • [9] Data curation as anticipatory generification in data infrastructure
    Parmiggiani, Elena
    Amagyei, Nana Kwame
    Kollerud, Steinar Kornelius Selebo
    EUROPEAN JOURNAL OF INFORMATION SYSTEMS, 2024, 33 (05) : 748 - 767
  • [10] Translational Researchers' Perceptions of Data Management Practices and Data Curation Needs: Findings from a Focus Group in an Academic Health Sciences Library
    Bardyn, Tania P.
    Resnick, Taryn
    Camina, Susan K.
    JOURNAL OF WEB LIBRARIANSHIP, 2012, 6 (04) : 274 - 287