ARKIVO Dataset: A Benchmark for Ontology-based Extraction Tools

被引:0
作者
Pandolfo, Laura [1 ]
Pulina, Luca [1 ]
机构
[1] Univ Sassari, Intelligent Syst DEsign & Applicat IDEA Lab, Via Muroni 23A, I-07100 Sassari, Italy
来源
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES (WEBIST) | 2021年
关键词
Semantic Web; Dataset; Benchmark; Ontology; Information Extraction; INFORMATION EXTRACTION;
D O I
10.5220/0010677000003058
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The amount of data available on the Web has grown significantly in the past years, increasing thus the need for efficient techniques able to retrieve information from data in order to discover valuable and relevant knowledge. In the last decade, the intersection of the Information Extraction and Semantic Web areas is providing new opportunities for improving ontology-based information extraction tools. However, one of the critical aspects in the development and evaluation of this type of system is the limited availability of existing annotated documents, especially in domains such as the historical one. In this paper we present the current state of affairs about our work in building a large and real-world RDF dataset with the purpose to support the development of Ontology-Based extraction tools. The presented dataset is the result of the efforts made within the ARKIVO project and it counts about 300 thousand triples, which are the outcome of the manually annotation process executed by domain experts. ARKIVO dataset is freely available and it can be used as a benchmark for the evaluation of systems that automatically annotate and extract entities from documents.
引用
收藏
页码:341 / 345
页数:5
相关论文
共 17 条
[1]   An analytical study of information extraction from unstructured and multidimensional big data [J].
Adnan, Kiran ;
Akbar, Rehan .
JOURNAL OF BIG DATA, 2019, 6 (01)
[2]  
Blomqvist E, 2016, STUD SEMANTIC WEB, V25, P23
[3]   Ontology-based information extraction for juridical events with case studies in Brazilian legal realm [J].
de Araujo D.A. ;
Rigo S.J. ;
Barbosa J.L.V. .
Artificial Intelligence and Law, 2017, 25 (04) :379-396
[4]   Analysis of named entity recognition and linking for tweets [J].
Derczynski, Leon ;
Maynard, Diana ;
Rizzo, Giuseppe ;
van Erp, Marieke ;
Gorrell, Genevieve ;
Troncy, Raphael ;
Petrak, Johann ;
Bontcheva, Kalina .
INFORMATION PROCESSING & MANAGEMENT, 2015, 51 (02) :32-49
[5]   OWL 2: The next step for OWL [J].
Grau, Bernardo Cuenca ;
Horrocks, Ian ;
Motik, Boris ;
Parsia, Bijan ;
Patel-Schneider, Peter ;
Sattler, Ulrike .
JOURNAL OF WEB SEMANTICS, 2008, 6 (04) :309-322
[6]   Towards Knowledge Handling in Ontology-Based Information Extraction Systems [J].
Konys, Agnieszka .
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES-2018), 2018, 126 :2208-2218
[7]   Information extraction meets the Semantic Web: A survey [J].
Martinez-Rodriguez, Jose L. ;
Hogan, Aidan ;
Lopez-Arevalo, Ivan .
SEMANTIC WEB, 2020, 11 (02) :255-335
[8]  
Nannan Che, 2019, Recent Developments in Intelligent Computing, Communication and Devices. Proceedings of International Conference on Intelligent Computing, Communication and Devices (ICCD 2017). Advances in Intelligent Systems and Computing (AISC 752), P259, DOI 10.1007/978-981-10-8944-2_31
[9]  
Pandolfo Laura, 2017, Advances in Artificial Intelligence: from Theory to Practice. 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017. Proceedings: LNAI 10350, P495, DOI 10.1007/978-3-319-60042-0_54
[10]  
Pandolfo L., 2017, P 2 WORKSHOP HUMANIT, P111