Effective Tooling for Linked Data Publishing in Scientific Research

被引：6

作者：

Purohit, Sumit ^{[1
]}

Smith, William ^{[1
]}

Chappell, Alan ^{[1
]}

Stephan, Eric ^{[1
]}

West, Patrick ^{[2
]}

Lee, Benno ^{[2
]}

Fox, Peter ^{[2
]}

机构：

[1] Pacific Northwest Natl Lab, Richland, WA 99354 USA

[2] Rensselaer Polytech Inst, Tetherless World Constellat, Troy, NY 12180 USA

来源：

2016 IEEE TENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC) | 2016年

关键词：

Linked Data Publishing; Semantic Data Curation; Data Publishing Tools; Data Discovery; BENCHMARK; ACCESS; SYSTEM;

D O I：

10.1109/ICSC.2016.87

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Challenges that make it difficult to find, share, and combine published data, such as data heterogeneity and resource discovery, have led to increased adoption of semantic data standards and data publishing technologies. To make data more accessible, interconnected and discoverable, some domains are being encouraged to publish their data as Linked Data. Consequently, this trend greatly increases the amount of data that semantic web tools are required to process, store, and interconnect. In attempting to process and manipulate large data sets, tools-ranging from simple text editors to modern triplestores-eventually breakdown upon reaching undefined thresholds. This paper shares our experiences in curating metadata, primarily to illustrate the challenges, and resulting limitations that data publishers and consumers have in the current technological environment. This paper also provides a Linked Data based solution to the research problem of resource discovery, and offers a systematic approach that the data publishers can take to select suitable tools to meet their data publishing needs. We present a real-world use case, the Resource Discovery for Extreme Scale Collaboration (RDESC), which features a scientific dataset(maximum size of 1.4 billion triples) used to evaluate a toolbox for data publishing in climate research. This paper also introduces a semantic data publishing software suite developed for the RDESC project.

引用

页码：24 / 31

页数：8

共 50 条

[41] Enforcing public data archiving policies in academic publishing: A study of ecology journals
Sholler, Dan
Ram, Karthik
Boettiger, Carl
Katz, Daniel S.
BIG DATA & SOCIETY, 2019, 6 (01):
[42] Publishing Anonymized Set-Valued Data via Disassociation towards Analysis
Awad, Nancy
Couchot, Jean-Francois
Al Bouna, Bechara
Philippe, Laurent
FUTURE INTERNET, 2020, 12 (04):
[43] The Challenge of Building Effective Data Lakes
Syed, Awez
SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 803 - 803
[44] A NEW RESEARCH METHOD IN MODERN SCIENTIFIC LINGUISTIC STUDIES
Tatsenko, Nataliya
INTERDISCIPLINARY STUDIES OF COMPLEX SYSTEMS, 2019, (15): : 21 - 33
[45] Scientific Developments and New Technological Trajectories in Sensor Research
Coccia, Mario
Roshani, Saeed
Mosleh, Melika
SENSORS, 2021, 21 (23)
[46] Transforming metadata content guidelines and instructions to linked data
Taniguchi, Shoichi
Hashizume, Akiko
JOURNAL OF INFORMATION SCIENCE, 2023,
[47] Managing Provenance of Implicit Data Flows in Scientific Experiments
Neves, Vitor C.
De Oliveira, Daniel
Ocana, Kary A. C. S.
Braganholo, Vanessa
Murta, Leonardo
ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2017, 17 (04)
[48] Performance analysis and data reduction for exascale scientific workflows
Kelly, Christopher
Xu, Wei
Pouchard, Line C.
Van Dam, Hubertus
Islam, Tanzima Z.
Yoo, Shinjae
Van Dam, Kerstin Kleese
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2025,
[49] TOWARDS A SCALABLE SCIENTIFIC DATA GRID MODEL AND SERVICES
Abdullah, Azizol
Othman, Mohamed
Sulaiman, Md Nasir
Ibrahim, Hamidah
Othman, Abu Talib
IIUM ENGINEERING JOURNAL, 2009, 10 (02): : 97 - 107
[50] The availability of raw data in substance abuse scientific journals
Vidal-Infer, Antonio
Aleixandre-Benavent, Rafael
Lucas-Dominguez, Rut
Sixto-Costoya, Andrea
JOURNAL OF SUBSTANCE USE, 2019, 24 (01) : 36 - 40

← 1 2 3 4 5 →