M3S: Scene Graph Driven Multi-Granularity Multi-Task Learning for Multi-Modal NER

被引：18

作者：

Wang, Jie ^{[1
,2
]}

Yang, Yan ^{[1
,2
]}

Liu, Keyu ^{[1
,2
]}

Zhu, Zhiping ^{[1
,2
]}

Liu, Xiaorong ^{[1
,2
]}

机构：

[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China

[2] Southwest Jiaotong Univ, Mfg Ind Chains Collaborat & Informat Support Tech, Chengdu 611756, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2023年 / 31卷

基金：

中国国家自然科学基金;

关键词：

Visualization; Task analysis; Semantics; Feature extraction; Multitasking; Social networking (online); Image segmentation; Named entity recognition; multi-modal learning; scene graph;

D O I：

10.1109/TASLP.2022.3221017

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Multi-modal Named Entity Recognition (MNER), which mainly focuses on enhancing text-only NER with visual information, has recently attracted considerable attention. Most current MNER models have made significant progress by jointly understanding visual and language modalities through layers of cross-modality attention. However, these approaches largely ignore the visual bias brought by the image contents and barely consider exploiting the multi-granularity representations and the interactions between visual objects, which are essential in recognizing ambiguous entities. In this paper, we propose a Scene graph driven Multi-modal Multi-granularity Multi-task learning (M3S) framework to better exploit visual and textual information in MNER. Specifically, to explicitly alleviate visual bias, we present a novel multi-task approach by employing the task of Named Entity Segmentation (NES) cascade with Named Entity Categorization (NEC). To obtain detailed visual semantics by explicitly modeling objects and relationships between paired objects, we construct scene graphs as a structured representation of the visual contents. Furthermore, a well-designed Multi-granularity Gated Aggregation (MGA) mechanism is introduced to capture inter-modality interactions and extract critical features for named entity recognition. Extensive experiments on two real public datasets demonstrate the effectiveness of our proposed M3S.

引用

页码：111 / 120

页数：10

共 50 条

[21] A Multi-modal Multi-task based Approach for Movie Recommendation
Raj, Subham
Mondal, Prabir
Chakder, Daipayan
Saha, Sriparna
Onoe, Naoyuki
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[22] Multi-task Multi-modal Models for Collective Anomaly Detection
Ide, Tsuyoshi
Phan, Dzung T.
Kalagnanam, Jayant
2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 177 - 186
[23] Multi-modal multi-task feature fusion for RGBT tracking
Cai, Yujue
Sui, Xiubao
Gu, Guohua
INFORMATION FUSION, 2023, 97
[24] Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings
Yi, Jiangyan
Tao, Jianhua
Fu, Ruibo
Wang, Tao
Zhang, Chu Yuan
Wang, Chenglong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2963 - 2973
[25] Fake News Detection in Social Media based on Multi-Modal Multi-Task Learning
Cui, Xinyu
Li, Yang
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (07) : 912 - 918
[26] Multi-Task Federated Split Learning Across Multi-Modal Data with Privacy Preservation
Dong, Yipeng
Luo, Wei
Wang, Xiangyang
Zhang, Lei
Xu, Lin
Zhou, Zehao
Wang, Lulu
SENSORS, 2025, 25 (01)
[27] MmAP : Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning
Xin, Yi
Du, Junlong
Wang, Qiang
Yan, Ke
Ding, Shouhong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16076 - 16084
[28] Cloud Type Classification Using Multi-modal Information Based on Multi-task Learning
Zhang, Yaxiu
Xie, Jiazu
He, Di
Dong, Qing
Zhang, Jiafeng
Zhang, Zhong
Liu, Shuang
COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, VOL. 1, 2022, 878 : 119 - 125
[29] Multi-modal Sarcasm Detection on Social Media via Multi-Granularity Information Fusion
Ou, Lisong
Li, Zhixin
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2025, 21 (03)
[30] MBFusion: Multi-modal balanced fusion and multi-task learning for cancer diagnosis and prognosis
Zhang, Ziye
Yin, Wendong
Wang, Shijin
Zheng, Xiaorou
Dong, Shoubin
Computers in Biology and Medicine, 2024, 181

← 1 2 3 4 5 →