M3S: Scene Graph Driven Multi-Granularity Multi-Task Learning for Multi-Modal NER

被引:18
|
作者
Wang, Jie [1 ,2 ]
Yang, Yan [1 ,2 ]
Liu, Keyu [1 ,2 ]
Zhu, Zhiping [1 ,2 ]
Liu, Xiaorong [1 ,2 ]
机构
[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China
[2] Southwest Jiaotong Univ, Mfg Ind Chains Collaborat & Informat Support Tech, Chengdu 611756, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Task analysis; Semantics; Feature extraction; Multitasking; Social networking (online); Image segmentation; Named entity recognition; multi-modal learning; scene graph;
D O I
10.1109/TASLP.2022.3221017
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multi-modal Named Entity Recognition (MNER), which mainly focuses on enhancing text-only NER with visual information, has recently attracted considerable attention. Most current MNER models have made significant progress by jointly understanding visual and language modalities through layers of cross-modality attention. However, these approaches largely ignore the visual bias brought by the image contents and barely consider exploiting the multi-granularity representations and the interactions between visual objects, which are essential in recognizing ambiguous entities. In this paper, we propose a Scene graph driven Multi-modal Multi-granularity Multi-task learning (M3S) framework to better exploit visual and textual information in MNER. Specifically, to explicitly alleviate visual bias, we present a novel multi-task approach by employing the task of Named Entity Segmentation (NES) cascade with Named Entity Categorization (NEC). To obtain detailed visual semantics by explicitly modeling objects and relationships between paired objects, we construct scene graphs as a structured representation of the visual contents. Furthermore, a well-designed Multi-granularity Gated Aggregation (MGA) mechanism is introduced to capture inter-modality interactions and extract critical features for named entity recognition. Extensive experiments on two real public datasets demonstrate the effectiveness of our proposed M3S.
引用
收藏
页码:111 / 120
页数:10
相关论文
共 50 条
  • [21] A Multi-modal Multi-task based Approach for Movie Recommendation
    Raj, Subham
    Mondal, Prabir
    Chakder, Daipayan
    Saha, Sriparna
    Onoe, Naoyuki
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [22] Multi-task Multi-modal Models for Collective Anomaly Detection
    Ide, Tsuyoshi
    Phan, Dzung T.
    Kalagnanam, Jayant
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 177 - 186
  • [23] Multi-modal multi-task feature fusion for RGBT tracking
    Cai, Yujue
    Sui, Xiubao
    Gu, Guohua
    INFORMATION FUSION, 2023, 97
  • [24] Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings
    Yi, Jiangyan
    Tao, Jianhua
    Fu, Ruibo
    Wang, Tao
    Zhang, Chu Yuan
    Wang, Chenglong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2963 - 2973
  • [25] Fake News Detection in Social Media based on Multi-Modal Multi-Task Learning
    Cui, Xinyu
    Li, Yang
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (07) : 912 - 918
  • [26] Multi-Task Federated Split Learning Across Multi-Modal Data with Privacy Preservation
    Dong, Yipeng
    Luo, Wei
    Wang, Xiangyang
    Zhang, Lei
    Xu, Lin
    Zhou, Zehao
    Wang, Lulu
    SENSORS, 2025, 25 (01)
  • [27] MmAP : Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning
    Xin, Yi
    Du, Junlong
    Wang, Qiang
    Yan, Ke
    Ding, Shouhong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16076 - 16084
  • [28] Cloud Type Classification Using Multi-modal Information Based on Multi-task Learning
    Zhang, Yaxiu
    Xie, Jiazu
    He, Di
    Dong, Qing
    Zhang, Jiafeng
    Zhang, Zhong
    Liu, Shuang
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, VOL. 1, 2022, 878 : 119 - 125
  • [29] Multi-modal Sarcasm Detection on Social Media via Multi-Granularity Information Fusion
    Ou, Lisong
    Li, Zhixin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2025, 21 (03)
  • [30] MBFusion: Multi-modal balanced fusion and multi-task learning for cancer diagnosis and prognosis
    Zhang, Ziye
    Yin, Wendong
    Wang, Shijin
    Zheng, Xiaorou
    Dong, Shoubin
    Computers in Biology and Medicine, 2024, 181