Data Curation with a Focus on Reuse

被引:2
作者
Esteva, Maria [1 ]
Sweat, Sandra [2 ]
Mclay, Robert [1 ]
Xu, Weijia [1 ]
Kulasekaran, Sivakumar [1 ]
机构
[1] Texas Adv Comp Ctr, Austin, TX 78758 USA
[2] Univ Texas Austin, Austin, TX 78712 USA
来源
2016 IEEE/ACM JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL) | 2016年
关键词
Data curation; high performance computing; distributed collections architecture; data publishing and reuse;
D O I
10.1145/2910896.2910906
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A dataset from the field of High Performance Computing (HPC) was curated with the focus on facilitating its reuse and to appeal to a broader audience beyond HPC specialists. At an early stage in the research project, the curators gathered requirements from prospective users of the dataset, focusing on how and for which research projects they would reuse the data. Users needs informed which curation tasks to conduct, which included: adding more information elements to the dataset to expand its content scope; removing personal information; and, packaging the data in a size, a format, and at a frequency of delivery that are convenient for access and analysis purposes. The curation tasks are embedded in the software that produces the data, and are implemented as an automated workflow that spans various HPC resources, in which the dataset is generated, processed and stored and the Texas ScholarWorks institutional repository, through which the data is published. Within this distributed architecture, the integrated data creation and curation workflow complies with long-term preservation requirements, and is the first one implemented as a collaboration between the supercomputing center where the data is created on ongoing basis, and the University Libraries at UT Austin where it is published. The targeted curation strategy included the design of proof of concept data analyses to evaluate if the curated data met the reuse scenarios proposed by users. The results suggest that the dataset is understandable, and that researchers can use it to answer some of the research questions they posed. Results also pointed to specific elements of the curation strategy that had to be improved and disclosed the difficulties involved in breaking data to new users.
引用
收藏
页码:45 / 54
页数:10
相关论文
共 50 条
[41]   Factors of trust in data reuse [J].
Yoon, Ayoung ;
Lee, Yoo Young .
ONLINE INFORMATION REVIEW, 2019, 43 (07) :1245-1262
[42]   Addressing Researchers' Needs through the Data Curation Profile [J].
Carlson, Jake ;
Leiter, Deborah .
JCDL 09: PROCEEDINGS OF THE 2009 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, 2009, :365-365
[43]   Editorial: Special Issue on Human in the Loop Data Curation [J].
Demartini, Gianluca ;
Sadiq, Shazia ;
Yang, Jie .
ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2024, 16 (01)
[44]   Facilitating Data Curation: a Solution Developed in the Toxicology Domain [J].
Debruyne, Christophe ;
Riggio, Jonathan ;
Gustafson, Emma ;
O'Sullivan, Declan ;
Vinken, Mathieu ;
Vanhaecke, Tamara ;
De Troyer, Olga .
2020 IEEE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2020), 2020, :315-320
[45]   ELI: an IoT-aware big data pipeline with data curation and data quality [J].
Jose de Haro-Olmo, Francisco ;
Valencia-Parra, Alvaro ;
Jesus Varela-Vaca, Angel ;
Antonio Alvarez-Bermejo, Jose ;
Teresa Gomez-Lopez, Maria .
PEERJ COMPUTER SCIENCE, 2023, 9
[46]   Difficulties and prospects of data curation for ADME in silico modeling [J].
Esaki, Tsuyoshi ;
Ikeda, Kazuyoshi .
CHEM-BIO INFORMATICS JOURNAL, 2023, 23 :1-6
[47]   Industry's role in data and software curation in the cloud [J].
Bishop, Judith .
JOURNAL OF SYSTEMS AND SOFTWARE, 2013, 86 (09) :2327-2329
[48]   Innovations for the curation and sharing of African social survey data [J].
Woolfrey, H.L. .
Data Science Journal, 2013, 12 :WDS185-WDS188
[49]   GRAVITY data curation: opening science-ready data products to the community [J].
Garcia, Paulo J., V ;
Morujdoa, Nuno ;
Leftley, James ;
Matter, Alexis ;
Percheron, Isabelle .
OPTICAL AND INFRARED INTERFEROMETRY AND IMAGING IX, 2024, 13095
[50]   Medical data quality assessment: On the development of an automated framework for medical data curation [J].
Pezoulas, Vasileios C. ;
Kourou, Konstantina D. ;
Kalatzis, Fanis ;
Exarchos, Themis P. ;
Venetsanopoulou, Aliki ;
Zampeli, Evi ;
Gandolfo, Saviana ;
Skopouli, Fotini ;
De Vita, Salvatore ;
Tzioufas, Athanasios G. ;
Fotiadis, Dimitrios I. .
COMPUTERS IN BIOLOGY AND MEDICINE, 2019, 107 :270-283