ICKA: An instruction construction and Knowledge Alignment framework for Multimodal Named Entity Recognition

被引:1
|
作者
Zeng, Qingyang [1 ]
Yuan, Minghui [1 ]
Wan, Jing [1 ]
Wang, Kunfeng [1 ]
Shi, Nannan [2 ]
Che, Qianzi [2 ]
Liu, Bin [2 ]
机构
[1] Beijing Univ Chem Technol, Beijing 100029, Peoples R China
[2] China Acad Chinese Med Sci, Inst Basic Res Clin Med, Beijing 100700, Peoples R China
基金
北京市自然科学基金;
关键词
Multimodal Named Entity Recognition; Multimodal learning; Semantic alignment; Visual language model; Social media; FUSION;
D O I
10.1016/j.eswa.2024.124867
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal Named Entity Recognition (MNER) aims to identify entities of predefined types in text by leveraging information from multiple modalities, most notably textual and visual information. Most efforts concentrate on improving cross-modality attention mechanisms to facilitate guidance between modalities. However, they still suffer from certain limitations: (1) it is difficult to establish a unified representation to bridge the semantic gap among different modalities; (2) mining the implicit relationships between text and image is crucial yet challenging. In this paper, we propose an Instruction Construction and Knowledge Alignment Framework for MNER named ICKA to address these issues. Specifically, we first employ a multi- head cross-modal attention mechanism to obtain the cross-modal fusion representation by fusing features from text-image pairs. Then, we integrate external knowledge from the pre-trained vision-language model (VLM) to facilitate semantic alignment between text and image and obtain inter-modality connections. Next, we construct the multimodal instruction that consists of the modal features and uses the inter-modality connections as a bridge between them. We then integrate the instruction into the language model to effectively incorporate multimodal knowledge. Finally, we perform sequence labeling using a Conditional Random Fields (CRF) decoder with a gating mechanism. The proposed method achieves F1 scores of 75.42% on the Twitter2015 dataset and 87.12% on the Twitter2017 dataset, demonstrating the competitiveness of our method.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] MAF: A General Matching and Alignment Framework for Multimodal Named Entity Recognition
    Xu, Bo
    Huang, Shizhou
    Sha, Chaofeng
    Wang, Hongya
    WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 1215 - 1223
  • [2] Dynamic Graph Construction Framework for Multimodal Named Entity Recognition in Social Media
    Mai, Weixing
    Zhang, Zhengxuan
    Li, Kuntao
    Xue, Yun
    Li, Fenghuan
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02) : 2513 - 2522
  • [3] Multimodal Named Entity Recognition with Image Attributes and Image Knowledge
    Chen, Dawei
    Li, Zhixu
    Gu, Binbin
    Chen, Zhigang
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 186 - 201
  • [4] A Survey on Multimodal Named Entity Recognition
    Qian, Shenyi
    Jin, Wenduo
    Chen, Yonggang
    Ma, Jiangtao
    Qiao, Yaqiong
    Lu, Jinyu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 609 - 622
  • [5] Chinese Named Entity Recognition for Clothing Knowledge Graph Construction
    Zhu, Ming
    Zhen, De-sheng
    2019 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE APPLICATIONS AND TECHNOLOGIES (AIAAT 2019), 2019, 646
  • [6] Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition
    Jia, Meihuizi
    Shen, Xin
    Shen, Lei
    Pang, Jinhui
    Liao, Lejian
    Song, Yang
    Chen, Meng
    He, Xiaodong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3549 - 3558
  • [7] A Multi-expert Collaborative Framework for Multimodal Named Entity Recognition
    Xu, Bo
    Jiang, Haiqi
    Wei, Shouang
    Du, Ming
    Song, Hui
    Wang, Hongya
    MULTIMEDIA MODELING, MMM 2025, PT I, 2025, 15520 : 30 - 43
  • [8] MESA: A Multimodal Entity Entailment framework for multimodal Entity Alignment
    Zhao, Yu
    Zhang, Ying
    Sui, Xuhui
    Cai, Xiangrui
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
  • [9] Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition
    He, Li
    Wang, Qingxiang
    Liu, Jie
    Duan, Jianyong
    Wang, Hao
    APPLIED SCIENCES-BASEL, 2024, 14 (06):
  • [10] A multi-task framework based on decomposition for multimodal named entity recognition
    Cai, Chenran
    Wang, Qianlong
    Qin, Bing
    Xu, Ruifeng
    NEUROCOMPUTING, 2024, 604