DBpedia and the live extraction of structured data from Wikipedia

被引：54

作者：

Morsey, Mohamed ^{[1
]}

Lehmann, Jens

Auer, Soeren ^{[1
]}

Stadler, Claus

Hellmann, Sebastian

机构：

[1] Univ Leipzig, Dept Comp Sci, Res Grp, Leipzig, Germany

来源：

PROGRAM-ELECTRONIC LIBRARY AND INFORMATION SYSTEMS | 2012年 / 46卷 / 02期

关键词：

Knowledge extraction; RDF; Wikipedia; Triplestore; Knowledge management; Data management; Databases; Websites;

D O I：

10.1108/00330331211221828

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Purpose - DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia-Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. This paper seeks to address these issues. Design/methodology/approach - Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of articles, which were recently updated. DBpedia-Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia-Live publishes the newly added/deleted triples in files, in order to enable synchronization between the DBpedia endpoint and other DBpedia mirrors. Findings - During the realization of DBpedia-Live the authors learned that it is crucial to process Wikipedia updates in a priority queue. Recently-updated Wikipedia articles should have the highest priority, over mapping-changes and unmodified pages. An overall finding is that there are plenty of opportunities arising from the emerging Web of Data for librarians. Practical implications - DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Many companies and researchers use DBpedia and its public services to improve their applications and research approaches. The DBpedia-Live framework improves DBpedia further by timely synchronizing it with Wikipedia, which is relevant for many use cases requiring up-to-date information. Originality/value - The new DBpedia-Live framework adds new features to the old DBpedia-Live framework, e.g. abstract extraction, ontology changes, and changesets publication.

引用

页码：157 / 181

页数：25

共 50 条

[21] Structured knowledge creation for Urdu language: A DBpedia approach
Rasham, Shanza
Khan, Habib Ullah
Maqbool, Fahad
Razzaq, Saad
Anwar, Sajid
Ilyas, Muhammad
EXPERT SYSTEMS, 2025, 42 (01)
[22] Extraction of Linked Data Triples from Japanese Wikipedia Text of Ukiyo-e Painters
Kimura, Fuminori
Mitsui, Katsuhiro
Maeda, Akira
2013 INTERNATIONAL CONFERENCE ON CULTURE AND COMPUTING (CULTURE AND COMPUTING 2013), 2013, : 192 - +
[23] Relation Extraction from Wikipedia Leveraging Intrinsic Patterns
Gu, Yulong
Liu, Weidong
Song, Jiaxing
2015 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT), VOL 1, 2015, : 181 - 186
[24] A generic method for multi word extraction from Wikipedia
Bekavac, Bozo
Tadic, Marko
PROCEEDINGS OF THE ITI 2008 30TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2008, : 663 - 667
[25] Hidden revolution of human priorities: An analysis of biographical data from Wikipedia
Reznik, Ilia
Shatalov, Vladimir
JOURNAL OF INFORMETRICS, 2016, 10 (01) : 124 - 131
[26] Information Extraction from Twitter Using DBpedia Ontology: Indonesia Tourism Places
Rosyiq, Ahmad
Hayah, Aina Rahmah
Hidayanto, Achmad Nizar
Naisuty, Meisuchi
Suhanto, Agus
Budi, Nur Fitriah Avuning
2019 INTERNATIONAL CONFERENCE ON INFORMATICS, MULTIMEDIA, CYBER AND INFORMATION SYSTEM (ICIMCIS), 2019, : 91 - 96
[27] Wikipedia citations: A comprehensive data set of citations with identifiers extracted from English Wikipedia
Singh, Harshdeep
West, Robert
Colavizza, Giovanni
QUANTITATIVE SCIENCE STUDIES, 2021, 2 (01): : 1 - 19
[28] RELATED WORD EXTRACTION FROM WIKIPEDIA FOR WEB RETRIEVAL ASSISTANCE
Hori, Kentaro
Oishi, Tetsuya
Mine, Tsunenori
Hasegawa, Ryuzo
Fujita, Hiroshi
Koshimura, Miyuki
ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2: AGENTS, 2010, : 192 - 199
[29] A Graph-Structured Dataset for Wikipedia Research
Aspert, Nicolas
Miz, Volodymyr
Ricaud, Benjamin
Vandergheynst, Pierre
COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ), 2019, : 1188 - 1193
[30] Hypernym-Hyponym Relation Extraction from Indonesian Wikipedia Text
Nityasya, Made Nindyatama
Mahendra, Rahmad
Adriani, Mirna
2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 285 - 289

← 1 2 3 4 5 →