Generating public transport data based on population distributions for RDF benchmarking

被引:3
|
作者
Taelman, Ruben [1 ]
Colpaert, Pieter [1 ]
Mannens, Erik [1 ]
Verborgh, Ruben [1 ]
机构
[1] Univ Ghent, IMEC, IDLab, Technol pk Zwijnaarde 15, B-9052 Ghent, Belgium
基金
欧盟地平线“2020”;
关键词
Public Transport; dataset generator; benchmarking; RDF; linked data;
D O I
10.3233/SW-180319
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When benchmarking RDF data management systems such as public transport route planners, system evaluation needs to happen under various realistic circumstances, which requires a wide range of datasets with different properties. Real-world datasets are almost ideal, as they offer these realistic circumstances, but they are often hard to obtain and inflexible for testing. For these reasons, synthetic dataset generators are typically preferred over real-world datasets due to their intrinsic flexibility. Unfortunately, many synthetic dataset that are generated within benchmarks are insufficiently realistic, raising questions about the generalizability of benchmark results to real-world scenarios. In order to benchmark geospatial and temporal RDF data management systems such as route planners with sufficient external validity and depth, we designed PODiGG, a highly configurable generation algorithm for synthetic public transport datasets with realistic geospatial and temporal characteristics comparable to those of their real-world variants. The algorithm is inspired by real-world public transit network design and scheduling methodologies. This article discusses the design and implementation of PODiGG and validates the properties of its generated datasets. Our findings show that the generator achieves a sufficient level of realism, based on the existing coherence metric and new metrics we introduce specifically for the public transport domain. Thereby, PODiGG provides a flexible foundation for benchmarking RDF data management systems with geospatial and temporal data.
引用
收藏
页码:305 / 328
页数:24
相关论文
共 50 条
  • [21] Spatiotemporal RDF Data Query Based on Subgraph Matching
    Meng, Xiangfu
    Zhu, Lin
    Li, Qing
    Zhang, Xiaoyan
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2021, 10 (12)
  • [22] Query Optimization for massive RDF data based on Spark
    Li, Shaohui
    Shen, Derong
    Kou, Yue
    Yang, Dan
    2018 4TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2018), 2018, : 219 - 224
  • [23] An RDF-Based Mediator for Health Data Interoperability
    Kuo, Mu-Hsing
    Kushniruk, Andre
    Borycki, Elizabeth
    MEDICAL INFORMATICS IN A UNITED AND HEALTHY EUROPE, 2009, 150 : 399 - 403
  • [25] LinkD: element-based data interlinking of RDF datasets in linked data
    Mohamed Salah Kettouch
    Cristina Luca
    Computing, 2022, 104 : 2685 - 2709
  • [26] Publishing public transport data on the Web with the Linked Connections framework
    Rojas, Julian Andres
    Delva, Harm
    Colpaert, Pieter
    Verborgh, Ruben
    SEMANTIC WEB, 2023, 14 (04) : 659 - 693
  • [27] LinkD: element-based data interlinking of RDF datasets in linked data
    Kettouch, Mohamed Salah
    Luca, Cristina
    COMPUTING, 2022, 104 (12) : 2685 - 2709
  • [28] A Study of RDB-Based RDF Data Management Techniques
    Jalali, Vahid
    Zhou, Mo
    Wu, Yuqing
    WEB-AGE INFORMATION MANAGEMENT, 2011, 6897 : 366 - 378
  • [29] Graph-based Indexing Method for Searching in RDF Data
    Kyu, Khin Myat
    Oo, Aung Nway
    2019 INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION TECHNOLOGIES (ICAIT), 2019, : 96 - 101
  • [30] Compacting Massive Public Transport Data
    Letelier, Benjamin
    Brisaboa, Nieves R.
    Gutierrez-Asorey, Pablo
    Parama, Jose R.
    Rodeiro, Tirso V.
    STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2023, 2023, 14240 : 310 - 322