Towards Zero-shot Knowledge Graph building: Automated Schema Inference

被引：0

作者：

Carta, Salvatore ^{[1
]}

Giuliani, Alessandro ^{[1
]}

Manca, Marco Manolo ^{[1
]}

Piano, Leonardo ^{[1
]}

Tiddia, Sandro Gabriele ^{[1
]}

机构：

[1] Univ Cagliari, Dept Math & Comp Sci, Cagliari, Italy

来源：

ADJUNCT PROCEEDINGS OF THE 32ND ACM CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION, UMAP 2024 | 2024年

关键词：

Ontology Learning; Large Language Models; Named Entity Recognition;

D O I：

10.1145/3631700.3665234

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the current Digital Transformation scenario, Knowledge Graphs are essential for comprehending, representing, and exploiting complex information in a structured form. The main paradigm in automatically generating proper Knowledge Graphs relies on predefined schemas or ontologies. Such schemas are typically manually constructed, requiring an intensive human effort, and are often sensitive to information loss due to negligence, incomplete analysis, or human subjectivity or inclination. Limiting human bias and the resulting information loss in creating proper Knowledge Graphs is paramount, particularly for user modeling in various sectors, such as education or healthcare. To this end, we propose a novel approach to automatically generating a proper entity schema. The devised methodology combines the language understanding capabilities of LLM with classical machine learning methods such as clustering to properly build an entity schema from a set of documents. This solution eliminates the need for human intervention and fosters a more efficient and comprehensive knowledge representation. The assessment of our proposal concerns adopting a state-of-the-art entity extraction model ( UniNER) to estimate the relevance of the extracted entities based on the generated schema. Results confirm the potential of our approach, as we observed a negligible difference between the topic similarity score obtained with the ground truth and with the automatically generated schema (less than 1% on average on three different datasets). Such an outcome confirms that the proposed approach may be valuable in automatically creating an entity schema from a set of documents.

引用

页码：467 / 473

页数：7

共 21 条

[1] Chung HW, 2024, J MACH LEARN RES, V25
[2] Cimiano Philipp, 2005, INT C APPL NAT LANG
[3] User-Centric Ontology Population
Clarkson, Kenneth
Gentile, Anna Lisa
Gruhl, Daniel
Ristoski, Petar
Terdiman, Joseph
Welch, Steve
[J]. SEMANTIC WEB (ESWC 2018), 2018, 10843 : 112 - 127
[4] Collier Nigel, 2004, Introduction to the Bio-entity Recognition Task at JNLPBA
[5] Ehrlinger L., 2016, SEMANTiCS (Posters, Demos, SuCCESS), V48, P2
[6] Funk M, 2023, Arxiv, DOI arXiv:2309.09898
[7] Jiang P., 2024, arXiv
[8] Liu JJ, 2013, INT CONF ACOUST SPEE, P8386, DOI 10.1109/ICASSP.2013.6639301
[9] Maedche Alexander, 2001, The text-to-onto ontology extraction and maintenance system
[10] Deep Learning-based Text Classification: A Comprehensive Review
Minaee, Shervin
Kalchbrenner, Nal
Cambria, Erik
Nikzad, Narjes
Chenaghlu, Meysam
Gao, Jianfeng
[J]. ACM COMPUTING SURVEYS, 2022, 54 (03)

← 1 2 3 →