Balancing Efficiency and Quality in LLM-Based Entity Resolution on Structured Data

被引:0
作者
Nananukul, Navapat [1 ]
Kekriwal, Mayank [1 ]
机构
[1] Univ Southern Calif, Inst Informat Sci, Los Angeles, CA 90292 USA
来源
SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2024, PT III | 2025年 / 15213卷
关键词
Entity resolution; identity matching; knowledge graphs; blocking; efficiency; large language models; BLOCKING;
D O I
10.1007/978-3-031-78548-1_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entity Resolution (ER) is the problem of automatically determining when two or more entities refer to the same underlying entity. ER has been researched for over fifty years across multiple domains (including healthcare, e-commerce, and census data). In graph-based applications, such as deduplicating identities across (or even within) social media platforms, as well as knowledge graphs, ER can be particularly important. Traditionally, ER was a difficult problem both within Artificial Intelligence (AI) and in databases, owing to the quadratic O(n(2)) complexity of comparing n entities to each other, given one or more graphs with n total nodes. However, recent emergence of large language models (LLMs) allow us to address the challenges of ER as an AI problem, but a clear framework for applying LLMs in a cost-effective way remains an open issue. In this paper, we present such a framework and validate it through early experiments on real-world ER benchmarks. The framework is LLM-agnostic and is premised on assumptions that resemble pragmatic real-world requirements.
引用
收藏
页码:278 / 293
页数:16
相关论文
共 43 条
  • [1] ProgressER: Adaptive Progressive Approach to Relational Entity Resolution
    Altowim, Yasser
    Kalashnikov, Dmitri, V
    Mehrotra, Sharad
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (03)
  • [2] Parallel Progressive Approach to Entity Resolution Using MapReduce
    Altowim, Yasser
    Mehrotra, Sharad
    [J]. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 909 - 920
  • [3] Christophides V., 2015, Entity Resolution in the Web of Data, V5
  • [4] An Overview of End-to-End Entity Resolution for Big Data
    Christophides, Vassilis
    Efthymiou, Vasilis
    Palpanas, Themis
    Papadakis, George
    Stefanidis, Kostas
    [J]. ACM COMPUTING SURVEYS, 2021, 53 (06)
  • [5] Ebraheem M, 2019, Arxiv, DOI arXiv:1710.00597
  • [6] Duplicate record detection: A survey
    Elmagarmid, Ahmed K.
    Ipeirotis, Panagiotis G.
    Verykios, Vassilios S.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (01) : 1 - 16
  • [7] Fan L., 2023, A bibliometric review of large language models research from 2017 to 2023
  • [8] A THEORY FOR RECORD LINKAGE
    FELLEGI, IP
    SUNTER, AB
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1969, 64 (328) : 1183 - &
  • [9] Efficient and effective ER with progressive blocking
    Galhotra, Sainyam
    Firmani, Donatella
    Saha, Barna
    Srivastava, Divesh
    [J]. VLDB JOURNAL, 2021, 30 (04) : 537 - 557
  • [10] Getoor L., 2013, Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '13, P1527