MGICL: Multi-Grained Interaction Contrastive Learning for Multimodal Named Entity Recognition

被引：3

作者：

Guo, Aibo ^{[1
]}

Zhao, Xiang ^{[1
]}

Tan, Zhen ^{[1
]}

Xiao, Weidong ^{[1
]}

机构：

[1] Natl Univ Def Technol, Changsha, Hunan, Peoples R China

来源：

PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023 | 2023年

关键词：

Multimodal named entity recognition; Multimodal representation; Contrastive learning; Multi-Grained interaction contrastive learning; Visual gate;

D O I：

10.1145/3583780.3614967

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal Named Entity Recognition (MNER) aims to combine data from different modalities (e.g. text, images, videos, etc.) for recognition and classification of named entities, which is crucial for constructing Multimodal Knowledge Graphs (MMKGs). However, existing researches suffer from two prominant issues: over-reliance on textual features while neglecting visual features, and the lack of effective reduction of the feature space discrepancy of multimodal data. To overcome these challenges, this paper proposes a Multi-Grained Interaction Contrastive Learning framework for MNER task, namely MGICL. MGICL slices data into different granularities, i.e., sentence level/word token level for text, and image level/object level for image. By utilizing multimodal features with different granularities, the framework enables cross-contrast and narrows down the feature space discrepancy between modalities. Moreover, it facilitates the acquisition of valuable visual features by the text. Additionally, a visual gate control mechanism is introduced to dynamically select relevant visual information, thereby reducing the impact of visual noise. Experimental results demonstrate that the proposed MGICL framework satisfactorily tackles the challenges of MNER through enhancing information interaction of multimodal data and reducing the effect of noise, and hence, effectively improves the performance of MNER.

引用

页码：639 / 648

页数：10

共 32 条

[1] Asgari-Chenaghlu M, 2020, Arxiv, DOI arXiv:2001.06888
[2] Multimodal Named Entity Recognition with Image Attributes and Image Knowledge
Chen, Dawei
Li, Zhixu
Gu, Binbin
Chen, Zhigang
[J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 186 - 201
[3] Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion
Chen, Xiang
Zhang, Ningyu
Li, Lei
Deng, Shumin
Tan, Chuanqi
Xu, Changliang
Huang, Fei
Si, Luo
Chen, Huajun
[J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 904 - 915
[4] Chen Xin, 2022, arXiv, DOI [DOI 10.48550/ARXIV.2205, 10.48550/arXiv.2205]
[5] Conneau Alexis, 2020, ACL, P8440, DOI DOI 10.18653/V1/2020.ACL-MAIN.747
[6] Jia M., 2022, arXiv, DOI 10.48550/arXiv.2211.14739
[7] Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition
Jia, Meihuizi
Shen, Xin
Shen, Lei
Pang, Jinhui
Liao, Lejian
Song, Yang
Chen, Meng
He, Xiaodong
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3549 - 3558
[8] Li JH, 2021, ADV NEUR IN, V34
[9] UAMNer: uncertainty-aware multimodal named entity recognition in social media posts
Liu, Luping
Wang, Meiling
Zhang, Mozhi
Qing, Linbo
He, Xiaohai
[J]. APPLIED INTELLIGENCE, 2022, 52 (04) : 4109 - 4125
[10] Lu D, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P1990

← 1 2 3 4 →