MAF: A General Matching and Alignment Framework for Multimodal Named Entity Recognition

被引：47

作者：

Xu, Bo ^{[1
]}

Huang, Shizhou ^{[1
]}

Sha, Chaofeng ^{[2
]}

Wang, Hongya ^{[1
]}

机构：

[1] Donghua Univ, Sch Comp Sci & Technol, Shanghai, Peoples R China

[2] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligence Proc, Shanghai, Peoples R China

来源：

WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING | 2022年

基金：

中国国家自然科学基金;

关键词：

multimodal named entity recognition; contrastive learning;

D O I：

10.1145/3488560.3498475

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we study multimodal named entity recognition in social media posts. Existing works mainly focus on using a crossmodal attention mechanism to combine text representation with image representation. However, they still suffer from two weaknesses: (1) the current methods are based on a strong assumption that each text and its accompanying image are matched, and the image can be used to help identify named entities in the text. However, this assumption is not always true in real scenarios, and the strong assumption may reduce the recognition effect of the MNER model; (2) the current methods fail to construct a consistent representation to bridge the semantic gap between two modalities, which prevents the model from establishing a good connection between the text and image. To address these issues, we propose a general matching and alignment framework (MAF) for multimodal named entity recognition in social media posts. Specifically, to solve the first issue, we propose a novel cross-modal matching (CM) module to calculate the similarity score between text and image, and use the score to determine the proportion of visual information that should be retained. To solve the second issue, we propose a novel cross-modal alignment (CA) module to make the representations of the two modalities more consistent. We conduct extensive experiments, ablation studies, and case studies to demonstrate the effectiveness and efficiency of our method.The source code of this paper can be found in https://github.com/xubodhu/MAF.

引用

页码：1215 / 1223

页数：9

共 50 条

[1] ICKA: An instruction construction and Knowledge Alignment framework for Multimodal Named Entity Recognition
Zeng, Qingyang
Yuan, Minghui
Wan, Jing
Wang, Kunfeng
Shi, Nannan
Che, Qianzi
Liu, Bin
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
[2] Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition
He, Li
Wang, Qingxiang
Liu, Jie
Duan, Jianyong
Wang, Hao
APPLIED SCIENCES-BASEL, 2024, 14 (06):
[3] A Survey on Multimodal Named Entity Recognition
Qian, Shenyi
Jin, Wenduo
Chen, Yonggang
Ma, Jiangtao
Qiao, Yaqiong
Lu, Jinyu
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 609 - 622
[4] Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition
Jia, Meihuizi
Shen, Xin
Shen, Lei
Pang, Jinhui
Liao, Lejian
Song, Yang
Chen, Meng
He, Xiaodong
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3549 - 3558
[5] A Multi-expert Collaborative Framework for Multimodal Named Entity Recognition
Xu, Bo
Jiang, Haiqi
Wei, Shouang
Du, Ming
Song, Hui
Wang, Hongya
MULTIMEDIA MODELING, MMM 2025, PT I, 2025, 15520 : 30 - 43
[6] MESA: A Multimodal Entity Entailment framework for multimodal Entity Alignment
Zhao, Yu
Zhang, Ying
Sui, Xuhui
Cai, Xiangrui
INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
[7] A multi-task framework based on decomposition for multimodal named entity recognition
Cai, Chenran
Wang, Qianlong
Qin, Bing
Xu, Ruifeng
NEUROCOMPUTING, 2024, 604
[8] Dynamic Graph Construction Framework for Multimodal Named Entity Recognition in Social Media
Mai, Weixing
Zhang, Zhengxuan
Li, Kuntao
Xue, Yun
Li, Fenghuan
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02) : 2513 - 2522
[9] HORUS-NER: A Multimodal Named Entity Recognition Framework for Noisy Data
Esteves, Diego
Marcelino, Jose
Chawla, Piyush
Fischer, Asja
Lehmann, Jens
ADVANCES IN INTELLIGENT DATA ANALYSIS XIX, IDA 2021, 2021, 12695 : 89 - 100
[10] Fine-Grained Multimodal Named Entity Recognition and Grounding with a Generative Framework
Wang, Jieming
Li, Ziyan
Yu, Jianfei
Yang, Li
Xia, Rui
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3934 - 3943

← 1 2 3 4 5 →