M3S: Scene Graph Driven Multi-Granularity Multi-Task Learning for Multi-Modal NER

被引:18
|
作者
Wang, Jie [1 ,2 ]
Yang, Yan [1 ,2 ]
Liu, Keyu [1 ,2 ]
Zhu, Zhiping [1 ,2 ]
Liu, Xiaorong [1 ,2 ]
机构
[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China
[2] Southwest Jiaotong Univ, Mfg Ind Chains Collaborat & Informat Support Tech, Chengdu 611756, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Task analysis; Semantics; Feature extraction; Multitasking; Social networking (online); Image segmentation; Named entity recognition; multi-modal learning; scene graph;
D O I
10.1109/TASLP.2022.3221017
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multi-modal Named Entity Recognition (MNER), which mainly focuses on enhancing text-only NER with visual information, has recently attracted considerable attention. Most current MNER models have made significant progress by jointly understanding visual and language modalities through layers of cross-modality attention. However, these approaches largely ignore the visual bias brought by the image contents and barely consider exploiting the multi-granularity representations and the interactions between visual objects, which are essential in recognizing ambiguous entities. In this paper, we propose a Scene graph driven Multi-modal Multi-granularity Multi-task learning (M3S) framework to better exploit visual and textual information in MNER. Specifically, to explicitly alleviate visual bias, we present a novel multi-task approach by employing the task of Named Entity Segmentation (NES) cascade with Named Entity Categorization (NEC). To obtain detailed visual semantics by explicitly modeling objects and relationships between paired objects, we construct scene graphs as a structured representation of the visual contents. Furthermore, a well-designed Multi-granularity Gated Aggregation (MGA) mechanism is introduced to capture inter-modality interactions and extract critical features for named entity recognition. Extensive experiments on two real public datasets demonstrate the effectiveness of our proposed M3S.
引用
收藏
页码:111 / 120
页数:10
相关论文
共 50 条
  • [31] Align vision-language semantics by multi-task learning for multi-modal summarization
    Cui C.
    Liang X.
    Wu S.
    Li Z.
    Neural Computing and Applications, 2024, 36 (25) : 15653 - 15666
  • [32] M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition
    Zhang, Yazhou
    Jia, Ao
    Wang, Bo
    Zhang, Peng
    Zhao, Dongming
    Li, Pu
    Hou, Yuexian
    Jin, Xiaojia
    Song, Dawei
    Qin, Jing
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (01)
  • [33] Multi-Granularity Federated Learning by Graph-Partitioning
    Dai, Ziming
    Zhao, Yunfeng
    Qiu, Chao
    Wang, Xiaofei
    Yao, Haipeng
    Niyato, Dusit
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2025, 13 (01) : 18 - 33
  • [34] Multi-Granularity Contrastive Learning for Graph with Hierarchical Pooling
    Liu, Peishuo
    Zhou, Cangqi
    Liu, Xiao
    Zhang, Jing
    Li, Qianmu
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IV, 2023, 14257 : 499 - 511
  • [35] Multi-task Classification Model Based On Multi-modal Glioma Data
    Li, Jialun
    Jin, Yuanyuan
    Yu, Hao
    Wang, Xiaoling
    Zhuang, Qiyuan
    Chen, Liang
    11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 165 - 172
  • [36] Joint predictions of multi-modal ride-hailing demands: A deep multi-task multi-graph learning-based approach
    Ke, Jintao
    Feng, Siyuan
    Zhu, Zheng
    Yang, Hai
    Ye, Jieping
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2021, 127
  • [37] Multi-Modal Fusion for Multi-Task Fuzzy Detection of Rail Anomalies
    Liyuan, Yang
    Osman, Ghazali
    Abdul Rahman, Safawi
    Mustapha, Muhammad Firdaus
    IEEE ACCESS, 2024, 12 : 73925 - 73935
  • [38] Multi-view representation learning in multi-task scene
    Run-kun Lu
    Jian-wei Liu
    Si-ming Lian
    Xin Zuo
    Neural Computing and Applications, 2020, 32 : 10403 - 10422
  • [39] Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer's disease
    Zhang, Daoqiang
    Shen, Dinggang
    NEUROIMAGE, 2012, 59 (02) : 895 - 907
  • [40] Multi-view representation learning in multi-task scene
    Lu, Run-kun
    Liu, Jian-wei
    Lian, Si-ming
    Zuo, Xin
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (14): : 10403 - 10422