DBpedia and the live extraction of structured data from Wikipedia

被引:54
|
作者
Morsey, Mohamed [1 ]
Lehmann, Jens
Auer, Soeren [1 ]
Stadler, Claus
Hellmann, Sebastian
机构
[1] Univ Leipzig, Dept Comp Sci, Res Grp, Leipzig, Germany
关键词
Knowledge extraction; RDF; Wikipedia; Triplestore; Knowledge management; Data management; Databases; Websites;
D O I
10.1108/00330331211221828
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia-Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. This paper seeks to address these issues. Design/methodology/approach - Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of articles, which were recently updated. DBpedia-Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia-Live publishes the newly added/deleted triples in files, in order to enable synchronization between the DBpedia endpoint and other DBpedia mirrors. Findings - During the realization of DBpedia-Live the authors learned that it is crucial to process Wikipedia updates in a priority queue. Recently-updated Wikipedia articles should have the highest priority, over mapping-changes and unmodified pages. An overall finding is that there are plenty of opportunities arising from the emerging Web of Data for librarians. Practical implications - DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Many companies and researchers use DBpedia and its public services to improve their applications and research approaches. The DBpedia-Live framework improves DBpedia further by timely synchronizing it with Wikipedia, which is relevant for many use cases requiring up-to-date information. Originality/value - The new DBpedia-Live framework adds new features to the old DBpedia-Live framework, e.g. abstract extraction, ontology changes, and changesets publication.
引用
收藏
页码:157 / 181
页数:25
相关论文
共 50 条
  • [1] WHAD: Wikipedia historical attributes data Historical structured data extraction and vandalism detection from the Wikipedia edit history
    Alfonseca, Enrique
    Garrido, Guillermo
    Delort, Jean-Yves
    Penas, Anselmo
    LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (04) : 1163 - 1190
  • [2] DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia
    Lehmann, Jens
    Isele, Robert
    Jakob, Max
    Jentzsch, Anja
    Kontokostas, Dimitris
    Mendes, Pablo N.
    Hellmann, Sebastian
    Morsey, Mohamed
    van Kleef, Patrick
    Auer, Soeren
    Bizer, Christian
    SEMANTIC WEB, 2015, 6 (02) : 167 - 195
  • [3] WHAD: Wikipedia historical attributes dataHistorical structured data extraction and vandalism detection from the Wikipedia edit history
    Enrique Alfonseca
    Guillermo Garrido
    Jean-Yves Delort
    Anselmo Peñas
    Language Resources and Evaluation, 2013, 47 : 1163 - 1190
  • [4] Analysis of structured data on wikipedia
    Moreira, Johny
    Neto, Everaldo Costa
    Barbosa, Luciano
    2021, Inderscience Publishers (15) : 71 - 86
  • [5] From DBpedia toWikipedia: Filling the Gap by Discovering Wikipedia Conventions
    Torres, Diego
    Molli, Pascal
    Skaf-Molli, Hala
    Diaz, Alicia
    2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1, 2012, : 535 - 539
  • [6] Building up ontologies from the English wikipedia and comparing with YAGO and DBpedia
    Kawakami T.
    Morita T.
    Yamaguchi T.
    Transactions of the Japanese Society for Artificial Intelligence, 2020, 35 (04) : 1 - 14
  • [7] Building The Indonesian NE Dataset Using Wikipedia and DBpedia with Entities Expansion Method on DBpedia
    Alfarohmi, Haji Dito Murya
    Bijaksana, Moch. Arif
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 334 - 339
  • [8] DBpedia - A crystallization point for the Web of Data
    Bizer, Christian
    Lehmann, Jens
    Kobilarov, Georgi
    Auer, Soeren
    Becker, Christian
    Cyganiak, Richard
    Hellmann, Sebastian
    JOURNAL OF WEB SEMANTICS, 2009, 7 (03): : 154 - 165
  • [9] Structured Data Extraction from Emails
    Mahlawi, Ashraf Q.
    Sasi, Sreela
    2017 INTERNATIONAL CONFERENCE ON NETWORKS & ADVANCES IN COMPUTATIONAL TECHNOLOGIES (NETACT), 2017, : 323 - 328
  • [10] Modelling provenance of DBpedia resources using Wikipedia contributions
    Orlandi, Fabrizio
    Passant, Alexandre
    JOURNAL OF WEB SEMANTICS, 2011, 9 (02): : 149 - 164