Generating public transport data based on population distributions for RDF benchmarking

被引:3
作者
Taelman, Ruben [1 ]
Colpaert, Pieter [1 ]
Mannens, Erik [1 ]
Verborgh, Ruben [1 ]
机构
[1] Univ Ghent, IMEC, IDLab, Technol pk Zwijnaarde 15, B-9052 Ghent, Belgium
基金
欧盟地平线“2020”;
关键词
Public Transport; dataset generator; benchmarking; RDF; linked data;
D O I
10.3233/SW-180319
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When benchmarking RDF data management systems such as public transport route planners, system evaluation needs to happen under various realistic circumstances, which requires a wide range of datasets with different properties. Real-world datasets are almost ideal, as they offer these realistic circumstances, but they are often hard to obtain and inflexible for testing. For these reasons, synthetic dataset generators are typically preferred over real-world datasets due to their intrinsic flexibility. Unfortunately, many synthetic dataset that are generated within benchmarks are insufficiently realistic, raising questions about the generalizability of benchmark results to real-world scenarios. In order to benchmark geospatial and temporal RDF data management systems such as route planners with sufficient external validity and depth, we designed PODiGG, a highly configurable generation algorithm for synthetic public transport datasets with realistic geospatial and temporal characteristics comparable to those of their real-world variants. The algorithm is inspired by real-world public transit network design and scheduling methodologies. This article discusses the design and implementation of PODiGG and validates the properties of its generated datasets. Our findings show that the generator achieves a sufficient level of realism, based on the existing coherence metric and new metrics we introduce specifically for the public transport domain. Thereby, PODiGG provides a flexible foundation for benchmarking RDF data management systems with geospatial and temporal data.
引用
收藏
页码:305 / 328
页数:24
相关论文
共 50 条
  • [41] Exploring public transport usage trends in an ageing population
    Currie, Graham
    Delbosc, Alexa
    TRANSPORTATION, 2010, 37 (01) : 151 - 164
  • [42] Exploring public transport usage trends in an ageing population
    Graham Currie
    Alexa Delbosc
    Transportation, 2010, 37 : 151 - 164
  • [43] E-R Model based RDF Data Storage in RDB
    Xu, LiLi
    Lee, SangWon
    Kim, Seokhyun
    PROCEEDINGS 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, (ICCSIT 2010), VOL 1, 2010, : 258 - 262
  • [44] Cluster-Based Join for Geographically Distributed Big RDF Data
    Yang, Fan
    Crainiceanu, Adina
    Chen, Zhiyuan
    Needham, Don
    2019 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS 2019), 2019, : 170 - 178
  • [45] Semi-Automatic Ontology-Driven Development Documentation Generating Documents from RDF Data and DITA Templates
    Pikus, Yevgen
    Weissenberg, Norbert
    Holtkamp, Bernhard
    Otto, Boris
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 2293 - 2302
  • [46] Improving predictions of public transport usage during disturbances based on smart card data
    Yap, M. D.
    Nijenstein, S.
    van Oort, N.
    TRANSPORT POLICY, 2018, 61 : 84 - 95
  • [47] Estimating the Average Speed of Public Transport Vehicles Based on Traffic Control System Data
    Oskarbski, Jacek
    Birr, Krystian
    Miszewski, Micha
    Zarski, Karol
    2015 INTERNATIONAL CONFERENCE ON MODELS AND TECHNOLOGIES FOR INTELLIGENT TRANSPORTATION SYSTEMS (MT-ITS), 2015, : 287 - 293
  • [48] Accessing RDF(S) data resources in service-based Grid infrastructures
    Esteban Gutierrez, Miguel
    Kojima, Isao
    Pahlevi, Said Mirza
    Corcho, Oscar
    Gomez-Perez, Asuncion
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2009, 21 (08) : 1029 - 1051
  • [49] Efficient Retrieval of Data Using Semantic Search Engine Based on NLP and RDF
    Yadav, Usha
    Duhan, Neelam
    JOURNAL OF WEB ENGINEERING, 2021, 20 (08): : 2285 - 2317
  • [50] BRDPHHC: A Balance RDF Data Partitioning Algorithm based on Hybrid Hierarchical Clustering
    Leng, Yonglin
    Chen, Zhikui
    Zhong, Fangming
    Zhong, Hua
    2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 1755 - 1760