Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition

被引：20

作者：

Jia, Meihuizi ^{[1
,2
]}

Shen, Xin ^{[3
]}

Shen, Lei ^{[2
]}

Pang, Jinhui ^{[1
]}

Liao, Lejian ^{[1
]}

Song, Yang ^{[2
]}

Chen, Meng ^{[2
]}

He, Xiaodong ^{[2
]}

机构：

[1] Beijing Inst Technol, Beijing, Peoples R China

[2] JD AI, Beijing, Peoples R China

[3] Australian Natl Univ, Canberra, ACT, Australia

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

基金：

国家重点研发计划;

关键词：

multimodal named entity recognition; machine reading comprehension; visual grounding; transfer learning;

D O I：

10.1145/3503161.3548427

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Multimodal named entity recognition (MNER) is a vision-language task where the system is required to detect entity spans and corresponding entity types given a sentence-image pair. Existing methods capture text-image relations with various attention mechanisms that only obtain implicit alignments between entity types and image regions. To locate regions more accurately and better model cross-/within-modal relations, we propose a machine reading comprehension based framework for MNER, namely MRC-MNER. By utilizing queries in MRC, our framework can provide prior information about entity types and image regions. Specifically, we design two stages, Query-Guided Visual Grounding and Multi-Level Modal Interaction, to align fine-grained type-region information and simulate text-image/inner-text interactions respectively. For the former, we train a visual grounding model via transfer learning to extract region candidates that can be further integrated into the second stage to enhance token representations. For the latter, we design text-image and inner-text interaction modules along with three sub-tasks for MRC-MNER. To verify the effectiveness of our model, we conduct extensive experiments on two public MNER datasets, Twitter2015 and Twitter2017. Experimental results show that MRC-MNER outperforms the current state-of-the-art models on Twitter2017, and yields competitive results on Twitter2015.

引用

页码：3549 / 3558

页数：10

共 50 条

[31] ESPVR: Entity Spans Position Visual Regions for Multimodal Named Entity Recognition
Li, Xiujiao
Sun, Guanglu
Liu, Xinyu
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7785 - 7794
[32] Multimodal Named Entity Recognition with Bottleneck Fusion and Contrastive Learning
Wang, Peng
Chen, Xiaohang
Shang, Ziyu
Ke, Wenjun
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (04) : 545 - 555
[33] Explicit Sparse Attention Network for Multimodal Named Entity Recognition
Liu, Yunfei
Li, Shengyang
Hu, Feihu
Liu, Anqi
Liu, Yanan
KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE GRAPH EMPOWERS THE DIGITAL ECONOMY, CCKS 2022, 2022, 1669 : 83 - 94
[34] Semantics Fusion of Hierarchical Transformers for Multimodal Named Entity Recognition
Tong, Zhao
Liu, Qiang
Shi, Haichao
Xia, Yuwei
Wu, Shu
Zhang, Xiao-Yu
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 414 - 426
[35] Bovine Viral Diarrhea Virus Named Entity Recognition Based on BioBERT and MRC
Li, YinFei
Ba, YunLi
Wang, RuLin
Zhou, WeiGuang
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (10)
[36] Multimodal Named Entity Recognition with Image Attributes and Image Knowledge
Chen, Dawei
Li, Zhixu
Gu, Binbin
Chen, Zhigang
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 186 - 201
[37] MMBERT: a unified framework for biomedical named entity recognition
Lei Fu
Zuquan Weng
Jiheng Zhang
Haihe Xie
Yiqing Cao
Medical & Biological Engineering & Computing, 2024, 62 : 327 - 341
[38] MMBERT: a unified framework for biomedical named entity recognition
Fu, Lei
Weng, Zuquan
Zhang, Jiheng
Xie, Haihe
Cao, Yiqing
MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, 62 (01) : 327 - 341
[39] MLNet: a multi-level multimodal named entity recognition architecture
Zhai, Hanming
Lv, Xiaojun
Hou, Zhiwen
Tong, Xin
Bu, Fanliang
FRONTIERS IN NEUROROBOTICS, 2023, 17
[40] CRISP: A cross-modal integration framework based on the surprisingly popular algorithm for multimodal named entity recognition
Liu, Haitao
Xin, Xianwei
Song, Jihua
Peng, Weiming
NEUROCOMPUTING, 2025, 614

← 1 2 3 4 5 →