A Dual -Way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking

被引:0
作者
Song, Shezheng [1 ]
Zhao, Shan [1 ]
Wang, Chengyu [2 ]
Yan, Tianwei [2 ]
Li, Shasha [2 ]
Mao, Xiaoguang [2 ]
Wang, Meng [1 ]
机构
[1] Hefei Univ Technol, Hefei, Peoples R China
[2] Natl Univ Def Technol, Changsha, Peoples R China
来源
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17 | 2024年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal Entity Linking (MEL) aims at linking ambiguous mentions with multimodal information to entity in Knowledge Graph (KG) such as Wikipedia, which plays a key role in many applications. However, existing methods suffer from shortcomings, including modality impurity such as noise in raw image and ambiguous textual entity representation, which puts obstacles to MEL. We formulate multi modal entity linking as a neural text matching problem where each multimodal information (text and image) is treated as a query, and the model learns the mapping from each query to the relevant entity from candidate entities. This paper introduces a dual-way enhanced (DWE) framework for MEL: (1) our model refines queries with multimodal data and addresses semantic gaps using cross -modal enhancers between text and image information. Besides, DWE innovatively leverages fine-grained image attributes, including facial characteristic and scene feature, to enhance and refine visual features. (2)By using Wikipedia descriptions, DWE enriches entity semantics and obtains more comprehensive textual representation, which reduces between textual representation and the entities in KG. Extensive experiments on three public benchmarks demonstrate that our method achieves state-of-the-art (SOTA) performance, indicating the superiority of our model. The code is released on https://github.com/season 1 blue/DWE.
引用
收藏
页码:19008 / 19016
页数:9
相关论文
共 41 条
  • [1] Adjali Omar, 2020, Advances in Information Retrieval, 42nd European Conference on IR Research, ECIR 2020. Proceedings. Lecture Notes in Computer Science (LNCS 12035), P463, DOI 10.1007/978-3-030-45439-5_31
  • [2] Adjali O, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P4285
  • [3] Named Entity Extraction for Knowledge Graphs: A Literature Overview
    Al-Moslmi, Tareq
    Ocana, Marc Gallofre
    Opdahl, Andreas L.
    Veres, Csaba
    [J]. IEEE ACCESS, 2020, 8 : 32862 - 32881
  • [4] DBpedia: A nucleus for a web of open data
    Auer, Soeren
    Bizer, Christian
    Kobilarov, Georgi
    Lehmann, Jens
    Cyganiak, Richard
    Ives, Zachary
    [J]. SEMANTIC WEB, PROCEEDINGS, 2007, 4825 : 722 - +
  • [5] Borth D., 2013, P 21 ACM INT C MULT, P223, DOI [DOI 10.1145/2502081.2502282, 10.1145/2502081.2502282]
  • [6] Chen J., 2022, ARXIV
  • [7] Chen X, 2023, Arxiv, DOI arXiv:2205.02357
  • [8] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [9] Eshel Y, 2017, Arxiv, DOI arXiv:1706.09147
  • [10] Fei H, 2023, Arxiv, DOI arXiv:2305.12256