MGICL: Multi-Grained Interaction Contrastive Learning for Multimodal Named Entity Recognition

被引:3
作者
Guo, Aibo [1 ]
Zhao, Xiang [1 ]
Tan, Zhen [1 ]
Xiao, Weidong [1 ]
机构
[1] Natl Univ Def Technol, Changsha, Hunan, Peoples R China
来源
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023 | 2023年
关键词
Multimodal named entity recognition; Multimodal representation; Contrastive learning; Multi-Grained interaction contrastive learning; Visual gate;
D O I
10.1145/3583780.3614967
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal Named Entity Recognition (MNER) aims to combine data from different modalities (e.g. text, images, videos, etc.) for recognition and classification of named entities, which is crucial for constructing Multimodal Knowledge Graphs (MMKGs). However, existing researches suffer from two prominant issues: over-reliance on textual features while neglecting visual features, and the lack of effective reduction of the feature space discrepancy of multimodal data. To overcome these challenges, this paper proposes a Multi-Grained Interaction Contrastive Learning framework for MNER task, namely MGICL. MGICL slices data into different granularities, i.e., sentence level/word token level for text, and image level/object level for image. By utilizing multimodal features with different granularities, the framework enables cross-contrast and narrows down the feature space discrepancy between modalities. Moreover, it facilitates the acquisition of valuable visual features by the text. Additionally, a visual gate control mechanism is introduced to dynamically select relevant visual information, thereby reducing the impact of visual noise. Experimental results demonstrate that the proposed MGICL framework satisfactorily tackles the challenges of MNER through enhancing information interaction of multimodal data and reducing the effect of noise, and hence, effectively improves the performance of MNER.
引用
收藏
页码:639 / 648
页数:10
相关论文
共 32 条
  • [1] Asgari-Chenaghlu M, 2020, Arxiv, DOI arXiv:2001.06888
  • [2] Multimodal Named Entity Recognition with Image Attributes and Image Knowledge
    Chen, Dawei
    Li, Zhixu
    Gu, Binbin
    Chen, Zhigang
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 186 - 201
  • [3] Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion
    Chen, Xiang
    Zhang, Ningyu
    Li, Lei
    Deng, Shumin
    Tan, Chuanqi
    Xu, Changliang
    Huang, Fei
    Si, Luo
    Chen, Huajun
    [J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 904 - 915
  • [4] Chen Xin, 2022, arXiv, DOI [DOI 10.48550/ARXIV.2205, 10.48550/arXiv.2205]
  • [5] Conneau Alexis, 2020, ACL, P8440, DOI DOI 10.18653/V1/2020.ACL-MAIN.747
  • [6] Jia M., 2022, arXiv, DOI 10.48550/arXiv.2211.14739
  • [7] Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition
    Jia, Meihuizi
    Shen, Xin
    Shen, Lei
    Pang, Jinhui
    Liao, Lejian
    Song, Yang
    Chen, Meng
    He, Xiaodong
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3549 - 3558
  • [8] Li JH, 2021, ADV NEUR IN, V34
  • [9] UAMNer: uncertainty-aware multimodal named entity recognition in social media posts
    Liu, Luping
    Wang, Meiling
    Zhang, Mozhi
    Qing, Linbo
    He, Xiaohai
    [J]. APPLIED INTELLIGENCE, 2022, 52 (04) : 4109 - 4125
  • [10] Lu D, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P1990