RevOnt: Reverse engineering of competency questions from knowledge graphs via language models

被引:2
作者
Ciroku, Fiorela [1 ]
de Berardinis, Jacopo [2 ]
Kim, Jongmo [2 ]
Merono-Penuela, Albert [2 ]
Presutti, Valentina [1 ]
Simperl, Elena [2 ]
机构
[1] Univ Bologna, Alma Mater Studiorum, Bologna, Italy
[2] Kings Coll London, London, England
来源
JOURNAL OF WEB SEMANTICS | 2024年 / 82卷
基金
欧盟地平线“2020”;
关键词
Knowledge engineering; Knowledge graph; Ontology development; Competency question extraction; ONTOLOGY;
D O I
10.1016/j.websem.2024.100822
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The process of developing ontologies - a formal, explicit specification of a shared conceptualisation - is addressed by well-known methodologies. As for any engineering development, its fundamental basis is the collection of requirements, which includes the elicitation of competency questions. Competency questions are defined through interacting with domain and application experts or by investigating existing datasets that may be used to populate the ontology i.e. its knowledge graph. The rise in popularity and accessibility of knowledge graphs provides an opportunity to support this phase with automatic tools. In this work, we explore the possibility of extracting competency questions from a knowledge graph. This reverses the traditional workflow in which knowledge graphs are built from ontologies, which in turn are engineered from competency questions. We describe in detail RevOnt, an approach that extracts and abstracts triples from a knowledge graph, generates questions based on triple verbalisations, and filters the resulting questions to yield a meaningful set of competency questions; the WDV dataset. This approach is implemented utilising the Wikidata knowledge graph as a use case, and contributes a set of core competency questions from 20 domains present in the WDV dataset. To evaluate RevOnt, we contribute a new dataset of manually-annotated high-quality competency questions, and compare the extracted competency questions by calculating their BLEU score against the human references. The results for the abstraction and question generation components of the approach show good to high quality. Meanwhile, the accuracy of the filtering component is above 86%, which is comparable to the state-of-the-art classifications.
引用
收藏
页数:18
相关论文
共 67 条
[1]  
Abdelghany A., 2019, INT J INTELL ENG SYS, V12, P170, DOI [10.22266/ijies2019.0430.17, DOI 10.22266/IJIES2019.0430.17]
[2]  
Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420
[3]  
Alharbi R, 2023, Arxiv, DOI arXiv:2311.05662
[4]  
Amaral Gabriel, 2022, Figshare, DOI 10.6084/m9.figshare.17159045.v1
[5]   WDV: A Broad Data Verbalisation Dataset Built from Wikidata [J].
Amaral, Gabriel ;
Rodrigues, Odinaldo ;
Simperl, Elena .
SEMANTIC WEB - ISWC 2022, 2022, 13489 :556-574
[6]  
[Anonymous], 2018, Synergies Between Knowledge Engineering and Software Engineering, DOI DOI 10.1007/978-3-319-64161-412
[7]  
[Anonymous], 2002, P ACL 02 WORKSH EFF, DOI [DOI 10.3115/1118108.1118117, 10.3115/1118108.1118117, DOI 10.3115/1225403.1225421]
[8]   A survey of ontology learning techniques and applications [J].
Asim, Muhammad Nabeel ;
Wasim, Muhammad ;
Khan, Muhammad Usman Ghani ;
Mahmood, Waqar ;
Abbasi, Hafiza Mahnoor .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2018,
[9]   Extraction of Common Conceptual Components from Multiple Ontologies [J].
Asprino, Luigi ;
Carriero, Valentina Anita ;
Presutti, Valentina .
PROCEEDINGS OF THE 11TH KNOWLEDGE CAPTURE CONFERENCE (K-CAP '21), 2021, :185-192
[10]   DBpedia: A nucleus for a web of open data [J].
Auer, Soeren ;
Bizer, Christian ;
Kobilarov, Georgi ;
Lehmann, Jens ;
Cyganiak, Richard ;
Ives, Zachary .
SEMANTIC WEB, PROCEEDINGS, 2007, 4825 :722-+