ICKA: An instruction construction and Knowledge Alignment framework for Multimodal Named Entity Recognition

被引:1
|
作者
Zeng, Qingyang [1 ]
Yuan, Minghui [1 ]
Wan, Jing [1 ]
Wang, Kunfeng [1 ]
Shi, Nannan [2 ]
Che, Qianzi [2 ]
Liu, Bin [2 ]
机构
[1] Beijing Univ Chem Technol, Beijing 100029, Peoples R China
[2] China Acad Chinese Med Sci, Inst Basic Res Clin Med, Beijing 100700, Peoples R China
基金
北京市自然科学基金;
关键词
Multimodal Named Entity Recognition; Multimodal learning; Semantic alignment; Visual language model; Social media; FUSION;
D O I
10.1016/j.eswa.2024.124867
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal Named Entity Recognition (MNER) aims to identify entities of predefined types in text by leveraging information from multiple modalities, most notably textual and visual information. Most efforts concentrate on improving cross-modality attention mechanisms to facilitate guidance between modalities. However, they still suffer from certain limitations: (1) it is difficult to establish a unified representation to bridge the semantic gap among different modalities; (2) mining the implicit relationships between text and image is crucial yet challenging. In this paper, we propose an Instruction Construction and Knowledge Alignment Framework for MNER named ICKA to address these issues. Specifically, we first employ a multi- head cross-modal attention mechanism to obtain the cross-modal fusion representation by fusing features from text-image pairs. Then, we integrate external knowledge from the pre-trained vision-language model (VLM) to facilitate semantic alignment between text and image and obtain inter-modality connections. Next, we construct the multimodal instruction that consists of the modal features and uses the inter-modality connections as a bridge between them. We then integrate the instruction into the language model to effectively incorporate multimodal knowledge. Finally, we perform sequence labeling using a Conditional Random Fields (CRF) decoder with a gating mechanism. The proposed method achieves F1 scores of 75.42% on the Twitter2015 dataset and 87.12% on the Twitter2017 dataset, demonstrating the competitiveness of our method.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Named Entity Recognition Datasets: A Classification Framework
    Ying Zhang
    Gang Xiao
    International Journal of Computational Intelligence Systems, 17
  • [22] A framework for Named Entity Recognition in the Open domain
    Evans, RJ
    RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING III, 2004, 260 : 267 - 276
  • [23] Medical Named Entity Recognition with Domain Knowledge
    Pei W.
    Sun S.
    Li X.
    Lu J.
    Yang L.
    Wu Y.
    Data Analysis and Knowledge Discovery, 2023, 7 (03) : 142 - 154
  • [24] Contrastive Pre-training with Multi-level Alignment for Grounded Multimodal Named Entity Recognition
    Bao, Xigang
    Tian, Mengyuan
    Wang, Luyao
    Zha, Zhiyuan
    Qin, Biao
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 795 - 803
  • [25] Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer
    Yu, Jianfei
    Jiang, Jing
    Yang, Li
    Xia, Rui
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3342 - 3352
  • [26] ESPVR: Entity Spans Position Visual Regions for Multimodal Named Entity Recognition
    Li, Xiujiao
    Sun, Guanglu
    Liu, Xinyu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7785 - 7794
  • [27] Multimodal Named Entity Recognition with Bottleneck Fusion and Contrastive Learning
    Wang, Peng
    Chen, Xiaohang
    Shang, Ziyu
    Ke, Wenjun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (04) : 545 - 555
  • [28] Explicit Sparse Attention Network for Multimodal Named Entity Recognition
    Liu, Yunfei
    Li, Shengyang
    Hu, Feihu
    Liu, Anqi
    Liu, Yanan
    KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE GRAPH EMPOWERS THE DIGITAL ECONOMY, CCKS 2022, 2022, 1669 : 83 - 94
  • [29] Semantics Fusion of Hierarchical Transformers for Multimodal Named Entity Recognition
    Tong, Zhao
    Liu, Qiang
    Shi, Haichao
    Xia, Yuwei
    Wu, Shu
    Zhang, Xiao-Yu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 414 - 426
  • [30] WASSERSTEIN CROSS-LINGUAL ALIGNMENT FOR NAMED ENTITY RECOGNITION
    Wang, Rui
    Henao, Ricardo
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8342 - 8346