DBpedia and the live extraction of structured data from Wikipedia

被引:54
作者
Morsey, Mohamed [1 ]
Lehmann, Jens
Auer, Soeren [1 ]
Stadler, Claus
Hellmann, Sebastian
机构
[1] Univ Leipzig, Dept Comp Sci, Res Grp, Leipzig, Germany
关键词
Knowledge extraction; RDF; Wikipedia; Triplestore; Knowledge management; Data management; Databases; Websites;
D O I
10.1108/00330331211221828
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia-Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. This paper seeks to address these issues. Design/methodology/approach - Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of articles, which were recently updated. DBpedia-Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia-Live publishes the newly added/deleted triples in files, in order to enable synchronization between the DBpedia endpoint and other DBpedia mirrors. Findings - During the realization of DBpedia-Live the authors learned that it is crucial to process Wikipedia updates in a priority queue. Recently-updated Wikipedia articles should have the highest priority, over mapping-changes and unmodified pages. An overall finding is that there are plenty of opportunities arising from the emerging Web of Data for librarians. Practical implications - DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Many companies and researchers use DBpedia and its public services to improve their applications and research approaches. The DBpedia-Live framework improves DBpedia further by timely synchronizing it with Wikipedia, which is relevant for many use cases requiring up-to-date information. Originality/value - The new DBpedia-Live framework adds new features to the old DBpedia-Live framework, e.g. abstract extraction, ontology changes, and changesets publication.
引用
收藏
页码:157 / 181
页数:25
相关论文
共 50 条
  • [31] Large SMT data-sets extracted from Wikipedia
    Tufis, Dan
    Ion, Radu
    Dumitrescu, Stefan
    Stefanescu, Dan
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 656 - 663
  • [32] Knowledge Extraction from Structured Engineering Drawings
    Lu, Tong
    Yang, Yubin
    Yang, Ruoyu
    Cai, Shijie
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 415 - 419
  • [33] Developing an automated mechanism to identify medical articles from wikipedia for knowledge extraction
    Yu, Lishan
    Yu, Sheng
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2020, 141 (141)
  • [34] Semantic Relation Extraction by Conditional Random Fields from Turkish Wikipedia Pages
    Girgin, Canan
    Diri, Banu
    [J]. 2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 136 - 139
  • [35] Parallel sentence extraction to improve cross-language information retrieval from Wikipedia
    Cheon, Juryong
    Ko, Youngjoong
    [J]. JOURNAL OF INFORMATION SCIENCE, 2021, 47 (02) : 281 - 293
  • [36] High-Precision Person Name Extraction from Turkish Texts Using Wikipedia
    Kucuk, Dilek
    Kucuk, Dogan
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2015, 2015, 9103 : 347 - 354
  • [37] Using Linked Data to Mine RDF from Wikipedia's Tables
    Munoz, Emir
    Hogan, Aidan
    Mileo, Alessandra
    [J]. WSDM'14: PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2014, : 533 - 542
  • [38] Causal evidence for social group sizes from Wikipedia editing data
    Burgess, M.
    Dunbar, R. I. M.
    [J]. ROYAL SOCIETY OPEN SCIENCE, 2024, 11 (10):
  • [39] Keyterm Extraction from Microblogs' Messages using Wikipedia-based Keyphraseness Measure
    Korshunov, Anton
    [J]. 2012 6TH INTERNATIONAL CONFERENCE ON SCIENCES OF ELECTRONICS, TECHNOLOGIES OF INFORMATION AND TELECOMMUNICATIONS (SETIT), 2012, : 925 - 931
  • [40] Automatic Extraction of Semantic Concept-Relation Triple Pattern from Wikipedia Articles
    Choi, Junho
    Choi, Chang
    Choi, Dongjin
    Kim, Jungin
    Kim, Pankoo
    [J]. INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (07): : 2755 - 2770