Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition

被引:20
|
作者
Jia, Meihuizi [1 ,2 ]
Shen, Xin [3 ]
Shen, Lei [2 ]
Pang, Jinhui [1 ]
Liao, Lejian [1 ]
Song, Yang [2 ]
Chen, Meng [2 ]
He, Xiaodong [2 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] JD AI, Beijing, Peoples R China
[3] Australian Natl Univ, Canberra, ACT, Australia
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
基金
国家重点研发计划;
关键词
multimodal named entity recognition; machine reading comprehension; visual grounding; transfer learning;
D O I
10.1145/3503161.3548427
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multimodal named entity recognition (MNER) is a vision-language task where the system is required to detect entity spans and corresponding entity types given a sentence-image pair. Existing methods capture text-image relations with various attention mechanisms that only obtain implicit alignments between entity types and image regions. To locate regions more accurately and better model cross-/within-modal relations, we propose a machine reading comprehension based framework for MNER, namely MRC-MNER. By utilizing queries in MRC, our framework can provide prior information about entity types and image regions. Specifically, we design two stages, Query-Guided Visual Grounding and Multi-Level Modal Interaction, to align fine-grained type-region information and simulate text-image/inner-text interactions respectively. For the former, we train a visual grounding model via transfer learning to extract region candidates that can be further integrated into the second stage to enhance token representations. For the latter, we design text-image and inner-text interaction modules along with three sub-tasks for MRC-MNER. To verify the effectiveness of our model, we conduct extensive experiments on two public MNER datasets, Twitter2015 and Twitter2017. Experimental results show that MRC-MNER outperforms the current state-of-the-art models on Twitter2017, and yields competitive results on Twitter2015.
引用
收藏
页码:3549 / 3558
页数:10
相关论文
共 50 条
  • [31] ESPVR: Entity Spans Position Visual Regions for Multimodal Named Entity Recognition
    Li, Xiujiao
    Sun, Guanglu
    Liu, Xinyu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7785 - 7794
  • [32] Multimodal Named Entity Recognition with Bottleneck Fusion and Contrastive Learning
    Wang, Peng
    Chen, Xiaohang
    Shang, Ziyu
    Ke, Wenjun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (04) : 545 - 555
  • [33] Explicit Sparse Attention Network for Multimodal Named Entity Recognition
    Liu, Yunfei
    Li, Shengyang
    Hu, Feihu
    Liu, Anqi
    Liu, Yanan
    KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE GRAPH EMPOWERS THE DIGITAL ECONOMY, CCKS 2022, 2022, 1669 : 83 - 94
  • [34] Semantics Fusion of Hierarchical Transformers for Multimodal Named Entity Recognition
    Tong, Zhao
    Liu, Qiang
    Shi, Haichao
    Xia, Yuwei
    Wu, Shu
    Zhang, Xiao-Yu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 414 - 426
  • [35] Bovine Viral Diarrhea Virus Named Entity Recognition Based on BioBERT and MRC
    Li, YinFei
    Ba, YunLi
    Wang, RuLin
    Zhou, WeiGuang
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (10)
  • [36] Multimodal Named Entity Recognition with Image Attributes and Image Knowledge
    Chen, Dawei
    Li, Zhixu
    Gu, Binbin
    Chen, Zhigang
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 186 - 201
  • [37] MMBERT: a unified framework for biomedical named entity recognition
    Lei Fu
    Zuquan Weng
    Jiheng Zhang
    Haihe Xie
    Yiqing Cao
    Medical & Biological Engineering & Computing, 2024, 62 : 327 - 341
  • [38] MMBERT: a unified framework for biomedical named entity recognition
    Fu, Lei
    Weng, Zuquan
    Zhang, Jiheng
    Xie, Haihe
    Cao, Yiqing
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, 62 (01) : 327 - 341
  • [39] MLNet: a multi-level multimodal named entity recognition architecture
    Zhai, Hanming
    Lv, Xiaojun
    Hou, Zhiwen
    Tong, Xin
    Bu, Fanliang
    FRONTIERS IN NEUROROBOTICS, 2023, 17
  • [40] CRISP: A cross-modal integration framework based on the surprisingly popular algorithm for multimodal named entity recognition
    Liu, Haitao
    Xin, Xianwei
    Song, Jihua
    Peng, Weiming
    NEUROCOMPUTING, 2025, 614