VEMO: A Versatile Elastic Multi-modal Model for Search-Oriented Multi-task Learning

被引:0
|
作者
Fei, Nanyi [1 ]
Jiang, Hao [2 ]
Lu, Haoyu [3 ]
Long, Jinqiang [3 ]
Dai, Yanqi [3 ]
Fan, Tuo [2 ]
Cao, Zhao [2 ]
Lu, Zhiwu [3 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[2] Huawei Poisson Lab, Hangzhou, Zhejiang, Peoples R China
[3] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
来源
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I | 2024年 / 14608卷
基金
中国国家自然科学基金;
关键词
multi-modal model; multi-task learning; cross-modal search;
D O I
10.1007/978-3-031-56027-9_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal search is one fundamental task in multi-modal learning, but there is hardly any work that aims to solve multiple cross-modal search tasks at once. In this work, we propose a novel Versatile Elastic Multi-mOdal (VEMO) model for search-oriented multi-task learning. VEMO is versatile because we integrate cross-modal semantic search, named entity recognition, and scene text spotting into a unified framework, where the latter two can be further adapted to entity- and character-based image search tasks. VEMO is also elastic because we can freely assemble sub-modules of our flexible network architecture for corresponding tasks. Moreover, to give more choices on the effect-efficiency trade-off when performing cross-modal semantic search, we place multiple encoder exits. Experimental results show the effectiveness of our VEMO with only 37.6% network parameters compared to those needed for uni-task training. Further evaluations on entity- and character-based image search tasks also validate the superiority of search-oriented multi-task learning.
引用
收藏
页码:56 / 72
页数:17
相关论文
共 50 条
  • [1] Multi-modal microblog classification via multi-task learning
    Sicheng Zhao
    Hongxun Yao
    Sendong Zhao
    Xuesong Jiang
    Xiaolei Jiang
    Multimedia Tools and Applications, 2016, 75 : 8921 - 8938
  • [2] Multi-modal microblog classification via multi-task learning
    Zhao, Sicheng
    Yao, Hongxun
    Zhao, Sendong
    Jiang, Xuesong
    Jiang, Xiaolei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 8921 - 8938
  • [3] Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition
    Fan, Dinghao
    Lu, Hengjie
    Xu, Shugong
    Cao, Shan
    IEEE SENSORS JOURNAL, 2021, 21 (23) : 27026 - 27036
  • [4] Multi-modal Sentiment and Emotion Joint Analysis with a Deep Attentive Multi-task Learning Model
    Zhang, Yazhou
    Rong, Lu
    Li, Xiang
    Chen, Rui
    ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 518 - 532
  • [5] MultiMAE: Multi-modal Multi-task Masked Autoencoders
    Bachmann, Roman
    Mizrahi, David
    Atanov, Andrei
    Zamir, Amir
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 348 - 367
  • [6] Multi-Modal Meta Multi-Task Learning For Social Media Rumor Detection
    Poornima, R.
    Nagavarapu, Sateesh
    Navya, Soleti
    Katkoori, Arun Kumar
    Mohsen, Karrar Shareef
    Saikumar, K.
    2024 2ND WORLD CONFERENCE ON COMMUNICATION & COMPUTING, WCONF 2024, 2024,
  • [7] Multi-Modal Meta Multi-Task Learning for Social Media Rumor Detection
    Zhang, Huaiwen
    Qian, Shengsheng
    Fang, Quan
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1449 - 1459
  • [8] Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings
    Yi, Jiangyan
    Tao, Jianhua
    Fu, Ruibo
    Wang, Tao
    Zhang, Chu Yuan
    Wang, Chenglong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2963 - 2973
  • [9] Fake News Detection in Social Media based on Multi-Modal Multi-Task Learning
    Cui, Xinyu
    Li, Yang
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (07) : 912 - 918
  • [10] Multi-Task Federated Split Learning Across Multi-Modal Data with Privacy Preservation
    Dong, Yipeng
    Luo, Wei
    Wang, Xiangyang
    Zhang, Lei
    Xu, Lin
    Zhou, Zehao
    Wang, Lulu
    SENSORS, 2025, 25 (01)