MAF: A General Matching and Alignment Framework for Multimodal Named Entity Recognition

被引:47
|
作者
Xu, Bo [1 ]
Huang, Shizhou [1 ]
Sha, Chaofeng [2 ]
Wang, Hongya [1 ]
机构
[1] Donghua Univ, Sch Comp Sci & Technol, Shanghai, Peoples R China
[2] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligence Proc, Shanghai, Peoples R China
来源
WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING | 2022年
基金
中国国家自然科学基金;
关键词
multimodal named entity recognition; contrastive learning;
D O I
10.1145/3488560.3498475
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study multimodal named entity recognition in social media posts. Existing works mainly focus on using a crossmodal attention mechanism to combine text representation with image representation. However, they still suffer from two weaknesses: (1) the current methods are based on a strong assumption that each text and its accompanying image are matched, and the image can be used to help identify named entities in the text. However, this assumption is not always true in real scenarios, and the strong assumption may reduce the recognition effect of the MNER model; (2) the current methods fail to construct a consistent representation to bridge the semantic gap between two modalities, which prevents the model from establishing a good connection between the text and image. To address these issues, we propose a general matching and alignment framework (MAF) for multimodal named entity recognition in social media posts. Specifically, to solve the first issue, we propose a novel cross-modal matching (CM) module to calculate the similarity score between text and image, and use the score to determine the proportion of visual information that should be retained. To solve the second issue, we propose a novel cross-modal alignment (CA) module to make the representations of the two modalities more consistent. We conduct extensive experiments, ablation studies, and case studies to demonstrate the effectiveness and efficiency of our method.The source code of this paper can be found in https://github.com/xubodhu/MAF.
引用
收藏
页码:1215 / 1223
页数:9
相关论文
共 50 条
  • [1] ICKA: An instruction construction and Knowledge Alignment framework for Multimodal Named Entity Recognition
    Zeng, Qingyang
    Yuan, Minghui
    Wan, Jing
    Wang, Kunfeng
    Shi, Nannan
    Che, Qianzi
    Liu, Bin
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [2] Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition
    He, Li
    Wang, Qingxiang
    Liu, Jie
    Duan, Jianyong
    Wang, Hao
    APPLIED SCIENCES-BASEL, 2024, 14 (06):
  • [3] A Survey on Multimodal Named Entity Recognition
    Qian, Shenyi
    Jin, Wenduo
    Chen, Yonggang
    Ma, Jiangtao
    Qiao, Yaqiong
    Lu, Jinyu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 609 - 622
  • [4] Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition
    Jia, Meihuizi
    Shen, Xin
    Shen, Lei
    Pang, Jinhui
    Liao, Lejian
    Song, Yang
    Chen, Meng
    He, Xiaodong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3549 - 3558
  • [5] A Multi-expert Collaborative Framework for Multimodal Named Entity Recognition
    Xu, Bo
    Jiang, Haiqi
    Wei, Shouang
    Du, Ming
    Song, Hui
    Wang, Hongya
    MULTIMEDIA MODELING, MMM 2025, PT I, 2025, 15520 : 30 - 43
  • [6] MESA: A Multimodal Entity Entailment framework for multimodal Entity Alignment
    Zhao, Yu
    Zhang, Ying
    Sui, Xuhui
    Cai, Xiangrui
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
  • [7] A multi-task framework based on decomposition for multimodal named entity recognition
    Cai, Chenran
    Wang, Qianlong
    Qin, Bing
    Xu, Ruifeng
    NEUROCOMPUTING, 2024, 604
  • [8] Dynamic Graph Construction Framework for Multimodal Named Entity Recognition in Social Media
    Mai, Weixing
    Zhang, Zhengxuan
    Li, Kuntao
    Xue, Yun
    Li, Fenghuan
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02) : 2513 - 2522
  • [9] HORUS-NER: A Multimodal Named Entity Recognition Framework for Noisy Data
    Esteves, Diego
    Marcelino, Jose
    Chawla, Piyush
    Fischer, Asja
    Lehmann, Jens
    ADVANCES IN INTELLIGENT DATA ANALYSIS XIX, IDA 2021, 2021, 12695 : 89 - 100
  • [10] Fine-Grained Multimodal Named Entity Recognition and Grounding with a Generative Framework
    Wang, Jieming
    Li, Ziyan
    Yu, Jianfei
    Yang, Li
    Xia, Rui
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3934 - 3943