ICKA: An instruction construction and Knowledge Alignment framework for Multimodal Named Entity Recognition

被引：1

作者：

Zeng, Qingyang ^{[1
]}

Yuan, Minghui ^{[1
]}

Wan, Jing ^{[1
]}

Wang, Kunfeng ^{[1
]}

Shi, Nannan ^{[2
]}

Che, Qianzi ^{[2
]}

Liu, Bin ^{[2
]}

机构：

[1] Beijing Univ Chem Technol, Beijing 100029, Peoples R China

[2] China Acad Chinese Med Sci, Inst Basic Res Clin Med, Beijing 100700, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 255卷

基金：

北京市自然科学基金;

关键词：

Multimodal Named Entity Recognition; Multimodal learning; Semantic alignment; Visual language model; Social media; FUSION;

D O I：

10.1016/j.eswa.2024.124867

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal Named Entity Recognition (MNER) aims to identify entities of predefined types in text by leveraging information from multiple modalities, most notably textual and visual information. Most efforts concentrate on improving cross-modality attention mechanisms to facilitate guidance between modalities. However, they still suffer from certain limitations: (1) it is difficult to establish a unified representation to bridge the semantic gap among different modalities; (2) mining the implicit relationships between text and image is crucial yet challenging. In this paper, we propose an Instruction Construction and Knowledge Alignment Framework for MNER named ICKA to address these issues. Specifically, we first employ a multi- head cross-modal attention mechanism to obtain the cross-modal fusion representation by fusing features from text-image pairs. Then, we integrate external knowledge from the pre-trained vision-language model (VLM) to facilitate semantic alignment between text and image and obtain inter-modality connections. Next, we construct the multimodal instruction that consists of the modal features and uses the inter-modality connections as a bridge between them. We then integrate the instruction into the language model to effectively incorporate multimodal knowledge. Finally, we perform sequence labeling using a Conditional Random Fields (CRF) decoder with a gating mechanism. The proposed method achieves F1 scores of 75.42% on the Twitter2015 dataset and 87.12% on the Twitter2017 dataset, demonstrating the competitiveness of our method.

引用

页数：10

共 50 条

[21] Named Entity Recognition Datasets: A Classification Framework
Ying Zhang
Gang Xiao
International Journal of Computational Intelligence Systems, 17
[22] A framework for Named Entity Recognition in the Open domain
Evans, RJ
RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING III, 2004, 260 : 267 - 276
[23] Medical Named Entity Recognition with Domain Knowledge
Pei W.
Sun S.
Li X.
Lu J.
Yang L.
Wu Y.
Data Analysis and Knowledge Discovery, 2023, 7 (03) : 142 - 154
[24] Contrastive Pre-training with Multi-level Alignment for Grounded Multimodal Named Entity Recognition
Bao, Xigang
Tian, Mengyuan
Wang, Luyao
Zha, Zhiyuan
Qin, Biao
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 795 - 803
[25] Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer
Yu, Jianfei
Jiang, Jing
Yang, Li
Xia, Rui
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3342 - 3352
[26] ESPVR: Entity Spans Position Visual Regions for Multimodal Named Entity Recognition
Li, Xiujiao
Sun, Guanglu
Liu, Xinyu
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7785 - 7794
[27] Multimodal Named Entity Recognition with Bottleneck Fusion and Contrastive Learning
Wang, Peng
Chen, Xiaohang
Shang, Ziyu
Ke, Wenjun
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (04) : 545 - 555
[28] Explicit Sparse Attention Network for Multimodal Named Entity Recognition
Liu, Yunfei
Li, Shengyang
Hu, Feihu
Liu, Anqi
Liu, Yanan
KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE GRAPH EMPOWERS THE DIGITAL ECONOMY, CCKS 2022, 2022, 1669 : 83 - 94
[29] Semantics Fusion of Hierarchical Transformers for Multimodal Named Entity Recognition
Tong, Zhao
Liu, Qiang
Shi, Haichao
Xia, Yuwei
Wu, Shu
Zhang, Xiao-Yu
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 414 - 426
[30] WASSERSTEIN CROSS-LINGUAL ALIGNMENT FOR NAMED ENTITY RECOGNITION
Wang, Rui
Henao, Ricardo
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8342 - 8346

← 1 2 3 4 5 →