M3S: Scene Graph Driven Multi-Granularity Multi-Task Learning for Multi-Modal NER

被引：18

作者：

Wang, Jie ^{[1
,2
]}

Yang, Yan ^{[1
,2
]}

Liu, Keyu ^{[1
,2
]}

Zhu, Zhiping ^{[1
,2
]}

Liu, Xiaorong ^{[1
,2
]}

机构：

[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China

[2] Southwest Jiaotong Univ, Mfg Ind Chains Collaborat & Informat Support Tech, Chengdu 611756, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2023年 / 31卷

基金：

中国国家自然科学基金;

关键词：

Visualization; Task analysis; Semantics; Feature extraction; Multitasking; Social networking (online); Image segmentation; Named entity recognition; multi-modal learning; scene graph;

D O I：

10.1109/TASLP.2022.3221017

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Multi-modal Named Entity Recognition (MNER), which mainly focuses on enhancing text-only NER with visual information, has recently attracted considerable attention. Most current MNER models have made significant progress by jointly understanding visual and language modalities through layers of cross-modality attention. However, these approaches largely ignore the visual bias brought by the image contents and barely consider exploiting the multi-granularity representations and the interactions between visual objects, which are essential in recognizing ambiguous entities. In this paper, we propose a Scene graph driven Multi-modal Multi-granularity Multi-task learning (M3S) framework to better exploit visual and textual information in MNER. Specifically, to explicitly alleviate visual bias, we present a novel multi-task approach by employing the task of Named Entity Segmentation (NES) cascade with Named Entity Categorization (NEC). To obtain detailed visual semantics by explicitly modeling objects and relationships between paired objects, we construct scene graphs as a structured representation of the visual contents. Furthermore, a well-designed Multi-granularity Gated Aggregation (MGA) mechanism is introduced to capture inter-modality interactions and extract critical features for named entity recognition. Extensive experiments on two real public datasets demonstrate the effectiveness of our proposed M3S.

引用

页码：111 / 120

页数：10

共 50 条

[31] Align vision-language semantics by multi-task learning for multi-modal summarization
Cui C.
Liang X.
Wu S.
Li Z.
Neural Computing and Applications, 2024, 36 (25) : 15653 - 15666
[32] M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition
Zhang, Yazhou
Jia, Ao
Wang, Bo
Zhang, Peng
Zhao, Dongming
Li, Pu
Hou, Yuexian
Jin, Xiaojia
Song, Dawei
Qin, Jing
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (01)
[33] Multi-Granularity Federated Learning by Graph-Partitioning
Dai, Ziming
Zhao, Yunfeng
Qiu, Chao
Wang, Xiaofei
Yao, Haipeng
Niyato, Dusit
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2025, 13 (01) : 18 - 33
[34] Multi-Granularity Contrastive Learning for Graph with Hierarchical Pooling
Liu, Peishuo
Zhou, Cangqi
Liu, Xiao
Zhang, Jing
Li, Qianmu
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IV, 2023, 14257 : 499 - 511
[35] Multi-task Classification Model Based On Multi-modal Glioma Data
Li, Jialun
Jin, Yuanyuan
Yu, Hao
Wang, Xiaoling
Zhuang, Qiyuan
Chen, Liang
11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 165 - 172
[36] Joint predictions of multi-modal ride-hailing demands: A deep multi-task multi-graph learning-based approach
Ke, Jintao
Feng, Siyuan
Zhu, Zheng
Yang, Hai
Ye, Jieping
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2021, 127
[37] Multi-Modal Fusion for Multi-Task Fuzzy Detection of Rail Anomalies
Liyuan, Yang
Osman, Ghazali
Abdul Rahman, Safawi
Mustapha, Muhammad Firdaus
IEEE ACCESS, 2024, 12 : 73925 - 73935
[38] Multi-view representation learning in multi-task scene
Run-kun Lu
Jian-wei Liu
Si-ming Lian
Xin Zuo
Neural Computing and Applications, 2020, 32 : 10403 - 10422
[39] Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer's disease
Zhang, Daoqiang
Shen, Dinggang
NEUROIMAGE, 2012, 59 (02) : 895 - 907
[40] Multi-view representation learning in multi-task scene
Lu, Run-kun
Liu, Jian-wei
Lian, Si-ming
Zuo, Xin
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (14): : 10403 - 10422

← 1 2 3 4 5 →