Generating public transport data based on population distributions for RDF benchmarking

被引:3
|
作者
Taelman, Ruben [1 ]
Colpaert, Pieter [1 ]
Mannens, Erik [1 ]
Verborgh, Ruben [1 ]
机构
[1] Univ Ghent, IMEC, IDLab, Technol pk Zwijnaarde 15, B-9052 Ghent, Belgium
基金
欧盟地平线“2020”;
关键词
Public Transport; dataset generator; benchmarking; RDF; linked data;
D O I
10.3233/SW-180319
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When benchmarking RDF data management systems such as public transport route planners, system evaluation needs to happen under various realistic circumstances, which requires a wide range of datasets with different properties. Real-world datasets are almost ideal, as they offer these realistic circumstances, but they are often hard to obtain and inflexible for testing. For these reasons, synthetic dataset generators are typically preferred over real-world datasets due to their intrinsic flexibility. Unfortunately, many synthetic dataset that are generated within benchmarks are insufficiently realistic, raising questions about the generalizability of benchmark results to real-world scenarios. In order to benchmark geospatial and temporal RDF data management systems such as route planners with sufficient external validity and depth, we designed PODiGG, a highly configurable generation algorithm for synthetic public transport datasets with realistic geospatial and temporal characteristics comparable to those of their real-world variants. The algorithm is inspired by real-world public transit network design and scheduling methodologies. This article discusses the design and implementation of PODiGG and validates the properties of its generated datasets. Our findings show that the generator achieves a sufficient level of realism, based on the existing coherence metric and new metrics we introduce specifically for the public transport domain. Thereby, PODiGG provides a flexible foundation for benchmarking RDF data management systems with geospatial and temporal data.
引用
收藏
页码:305 / 328
页数:24
相关论文
共 50 条
  • [1] PoDiGG: A Public Transport RDF Dataset Generator
    Taelman, Ruben
    De Nies, Tom
    Verborgh, Ruben
    Mannens, Erik
    WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, : 843 - 844
  • [2] RDF-Gen: generating RDF triples from big data sources
    Georgios M. Santipantakis
    Konstantinos I. Kotis
    Apostolos Glenis
    George A. Vouros
    Christos Doulkeridis
    Akrivi Vlachou
    Knowledge and Information Systems, 2022, 64 : 2985 - 3015
  • [3] RDF-Gen: generating RDF triples from big data sources
    Santipantakis, Georgios M.
    Kotis, Konstantinos, I
    Glenis, Apostolos
    Vouros, George A.
    Doulkeridis, Christos
    Vlachou, Akrivi
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (11) : 2985 - 3015
  • [4] The role of benchmarking in public transport: The case of Thessaloniki, Greece
    Georgiadis, Georgios
    TRANSPORT RESEARCH ARENA 2012, 2012, 48 : 2577 - 2587
  • [5] BENCHMARKING AND ASSESSMENT OF GOOD PRACTICES IN PUBLIC TRANSPORT INFORMATION SYSTEMS
    Yatskiv, Irina
    Kopytov, Eugene
    Casellato, Domenico
    Luppino, Giuseppe
    McDonald, Rob
    TRANSPORT AND TELECOMMUNICATION JOURNAL, 2013, 14 (04) : 325 - 336
  • [6] Proposal of benchmarking methodology for the area of public passenger transport
    Mikušová M.
    Periodica Polytechnica Transportation Engineering, 2019, 47 (02): : 166 - 170
  • [7] Benchmarking efficiency of public passenger transport in larger cities
    Hilmola, Olli-Pekka
    BENCHMARKING-AN INTERNATIONAL JOURNAL, 2011, 18 (01) : 23 - 41
  • [8] causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery
    Goebler, Konstantin
    Windisch, Tobias
    Drton, Mathias
    Pychynski, Tim
    Sonntag, Steffen
    Roth, Martin
    CAUSAL LEARNING AND REASONING, VOL 236, 2024, 236 : 609 - 642
  • [9] Assessment and Benchmarking of Spatially Enabled RDF Stores for the Next Generation of Spatial Data Infrastructure
    Huang, Weiming
    Raza, Syed Amir
    Mirzov, Oleg
    Harrie, Lars
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (07):
  • [10] System Π: A Native RDF Repository Based on the Hypergraph Representation for RDF Data Model
    Gang Wu
    Juan-Zi Li
    Jian-Qiang Hu
    Ke-Hong Wang
    Journal of Computer Science and Technology, 2009, 24 : 652 - 664