Towards Zero-shot Knowledge Graph building: Automated Schema Inference

被引:0
作者
Carta, Salvatore [1 ]
Giuliani, Alessandro [1 ]
Manca, Marco Manolo [1 ]
Piano, Leonardo [1 ]
Tiddia, Sandro Gabriele [1 ]
机构
[1] Univ Cagliari, Dept Math & Comp Sci, Cagliari, Italy
来源
ADJUNCT PROCEEDINGS OF THE 32ND ACM CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION, UMAP 2024 | 2024年
关键词
Ontology Learning; Large Language Models; Named Entity Recognition;
D O I
10.1145/3631700.3665234
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the current Digital Transformation scenario, Knowledge Graphs are essential for comprehending, representing, and exploiting complex information in a structured form. The main paradigm in automatically generating proper Knowledge Graphs relies on predefined schemas or ontologies. Such schemas are typically manually constructed, requiring an intensive human effort, and are often sensitive to information loss due to negligence, incomplete analysis, or human subjectivity or inclination. Limiting human bias and the resulting information loss in creating proper Knowledge Graphs is paramount, particularly for user modeling in various sectors, such as education or healthcare. To this end, we propose a novel approach to automatically generating a proper entity schema. The devised methodology combines the language understanding capabilities of LLM with classical machine learning methods such as clustering to properly build an entity schema from a set of documents. This solution eliminates the need for human intervention and fosters a more efficient and comprehensive knowledge representation. The assessment of our proposal concerns adopting a state-of-the-art entity extraction model ( UniNER) to estimate the relevance of the extracted entities based on the generated schema. Results confirm the potential of our approach, as we observed a negligible difference between the topic similarity score obtained with the ground truth and with the automatically generated schema (less than 1% on average on three different datasets). Such an outcome confirms that the proposed approach may be valuable in automatically creating an entity schema from a set of documents.
引用
收藏
页码:467 / 473
页数:7
相关论文
共 21 条
  • [1] Chung HW, 2024, J MACH LEARN RES, V25
  • [2] Cimiano Philipp, 2005, INT C APPL NAT LANG
  • [3] User-Centric Ontology Population
    Clarkson, Kenneth
    Gentile, Anna Lisa
    Gruhl, Daniel
    Ristoski, Petar
    Terdiman, Joseph
    Welch, Steve
    [J]. SEMANTIC WEB (ESWC 2018), 2018, 10843 : 112 - 127
  • [4] Collier Nigel, 2004, Introduction to the Bio-entity Recognition Task at JNLPBA
  • [5] Ehrlinger L., 2016, SEMANTiCS (Posters, Demos, SuCCESS), V48, P2
  • [6] Funk M, 2023, Arxiv, DOI arXiv:2309.09898
  • [7] Jiang P., 2024, arXiv
  • [8] Liu JJ, 2013, INT CONF ACOUST SPEE, P8386, DOI 10.1109/ICASSP.2013.6639301
  • [9] Maedche Alexander, 2001, The text-to-onto ontology extraction and maintenance system
  • [10] Deep Learning-based Text Classification: A Comprehensive Review
    Minaee, Shervin
    Kalchbrenner, Nal
    Cambria, Erik
    Nikzad, Narjes
    Chenaghlu, Meysam
    Gao, Jianfeng
    [J]. ACM COMPUTING SURVEYS, 2022, 54 (03)