Data Curation with a Focus on Reuse

被引:2
|
作者
Esteva, Maria [1 ]
Sweat, Sandra [2 ]
Mclay, Robert [1 ]
Xu, Weijia [1 ]
Kulasekaran, Sivakumar [1 ]
机构
[1] Texas Adv Comp Ctr, Austin, TX 78758 USA
[2] Univ Texas Austin, Austin, TX 78712 USA
来源
2016 IEEE/ACM JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL) | 2016年
关键词
Data curation; high performance computing; distributed collections architecture; data publishing and reuse;
D O I
10.1145/2910896.2910906
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A dataset from the field of High Performance Computing (HPC) was curated with the focus on facilitating its reuse and to appeal to a broader audience beyond HPC specialists. At an early stage in the research project, the curators gathered requirements from prospective users of the dataset, focusing on how and for which research projects they would reuse the data. Users needs informed which curation tasks to conduct, which included: adding more information elements to the dataset to expand its content scope; removing personal information; and, packaging the data in a size, a format, and at a frequency of delivery that are convenient for access and analysis purposes. The curation tasks are embedded in the software that produces the data, and are implemented as an automated workflow that spans various HPC resources, in which the dataset is generated, processed and stored and the Texas ScholarWorks institutional repository, through which the data is published. Within this distributed architecture, the integrated data creation and curation workflow complies with long-term preservation requirements, and is the first one implemented as a collaboration between the supercomputing center where the data is created on ongoing basis, and the University Libraries at UT Austin where it is published. The targeted curation strategy included the design of proof of concept data analyses to evaluate if the curated data met the reuse scenarios proposed by users. The results suggest that the dataset is understandable, and that researchers can use it to answer some of the research questions they posed. Results also pointed to specific elements of the curation strategy that had to be improved and disclosed the difficulties involved in breaking data to new users.
引用
收藏
页码:45 / 54
页数:10
相关论文
共 50 条
  • [21] Data curation for a VALID Archive of Dutch Language Impairment Data
    van den Heuvel, Henk
    Sanders, Eric
    Klatter, Jetske
    van Hout, Roeland
    Fikkert, Paula
    Baker, Anne
    de Jong, Jan
    Wijnen, Frank
    Trilsbeek, Paul
    DUTCH JOURNAL OF APPLIED LINGUISTICS, 2014, 3 (02) : 127 - 135
  • [22] A Data-Driven Analysis of Behaviors in Data Curation Processes
    Han, Lei
    Chen, Tianwa
    Demartini, Gianluca
    Indulska, Marta
    Sadiq, Shazia
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (03)
  • [23] A new role for the academic librarian:: data curation
    Martinez-Uribe, Luis
    Macdonald, Stuart
    PROFESIONAL DE LA INFORMACION, 2008, 17 (03): : 273 - 280
  • [24] Institutional Structures for Research Data and Metadata Curation
    Mayernik, Matthew S.
    JCDL'13: PROCEEDINGS OF THE 13TH ACM/IEEE-CS JOINT CONFERENCE ON DIGITAL LIBRARIES, 2013, : 401 - 402
  • [25] Network of semantic wikis (Wicri) and Data Curation
    Tebbakh, Ali
    2014 4TH INTERNATIONAL SYMPOSIUM ISKO-MAGHREB: CONCEPTS AND TOOLS FOR KNOWLEDGE MANAGEMENT (ISKO-MAGHREB), 2014,
  • [26] Automatic Curation of Clinical Trials Data in LinkedCT
    Hassanzadeh, Oktie
    Miller, Renee J.
    SEMANTIC WEB - ISWC 2015, PT II, 2015, 9367 : 270 - 278
  • [27] Role Definition of STI Agencies in Data Curation
    Zhang, Jing
    Liu, Yanjun
    Li, Hui
    Zhao, Junchao
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), VOL 1, 2016, : 518 - 523
  • [28] The Craft and Coordination of Data Curation: Complicating Workflow Views of Data Science
    Thomer A.K.
    Akmon D.
    York J.J.
    Tyler A.R.B.
    Polasek F.
    Lafia S.
    Hemphill L.
    Yakel E.
    Proceedings of the ACM on Human-Computer Interaction, 2022, 6 (CSCW2):
  • [29] Emerging Data Curation Roles for Librarians: A Case Study of Agricultural Data
    Bracke, Marianne
    JOURNAL OF AGRICULTURAL & FOOD INFORMATION, 2011, 12 (01) : 65 - 74
  • [30] A Service-Based Framework for Adaptive Data Curation in Data Lakehouses
    Zouari, Firas
    Ghedira-Guegan, Chirine
    Boukadi, Khouloud
    Kabachi, Nadia
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2022, 2022, 13724 : 225 - 240