VEMO: A Versatile Elastic Multi-modal Model for Search-Oriented Multi-task Learning

被引:0
作者
Fei, Nanyi [1 ]
Jiang, Hao [2 ]
Lu, Haoyu [3 ]
Long, Jinqiang [3 ]
Dai, Yanqi [3 ]
Fan, Tuo [2 ]
Cao, Zhao [2 ]
Lu, Zhiwu [3 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[2] Huawei Poisson Lab, Hangzhou, Zhejiang, Peoples R China
[3] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
来源
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I | 2024年 / 14608卷
基金
中国国家自然科学基金;
关键词
multi-modal model; multi-task learning; cross-modal search;
D O I
10.1007/978-3-031-56027-9_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal search is one fundamental task in multi-modal learning, but there is hardly any work that aims to solve multiple cross-modal search tasks at once. In this work, we propose a novel Versatile Elastic Multi-mOdal (VEMO) model for search-oriented multi-task learning. VEMO is versatile because we integrate cross-modal semantic search, named entity recognition, and scene text spotting into a unified framework, where the latter two can be further adapted to entity- and character-based image search tasks. VEMO is also elastic because we can freely assemble sub-modules of our flexible network architecture for corresponding tasks. Moreover, to give more choices on the effect-efficiency trade-off when performing cross-modal semantic search, we place multiple encoder exits. Experimental results show the effectiveness of our VEMO with only 37.6% network parameters compared to those needed for uni-task training. Further evaluations on entity- and character-based image search tasks also validate the superiority of search-oriented multi-task learning.
引用
收藏
页码:56 / 72
页数:17
相关论文
共 50 条
  • [21] Multi-task learning to rank for web search
    Chang, Yi
    Bai, Jing
    Zhou, Ke
    Xue, Gui-Rong
    Zha, Hongyuan
    Zheng, Zhaohui
    [J]. PATTERN RECOGNITION LETTERS, 2012, 33 (02) : 173 - 181
  • [22] MM-DAG: Multi-task DAG Learning for Multi-modal Data - with Application for Traffic Congestion Analysis
    Lan, Tian
    Li, Ziyue
    Li, Zhishuai
    Bai, Lei
    Li, Man
    Tsung, Fugee
    Ketter, Wolfgang
    Zhao, Rui
    Zhang, Chen
    [J]. PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 1188 - 1199
  • [23] Multi-Task Multi-modal Semantic Hashing for Web Image Retrieval with Limited Supervision
    Xie, Liang
    Zhu, Lei
    Cheng, Zhiyong
    [J]. MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 465 - 477
  • [24] M3T-LM: A multi-modal multi-task learning model for jointly predicting patient length of stay and mortality
    Chen, Junde
    Li, Qing
    Liu, Feng
    Wen, Yuxin
    [J]. Computers in Biology and Medicine, 2024, 183
  • [25] Multi-task gradient descent for multi-task learning
    Lu Bai
    Yew-Soon Ong
    Tiantian He
    Abhishek Gupta
    [J]. Memetic Computing, 2020, 12 : 355 - 369
  • [26] Multi-task gradient descent for multi-task learning
    Bai, Lu
    Ong, Yew-Soon
    He, Tiantian
    Gupta, Abhishek
    [J]. MEMETIC COMPUTING, 2020, 12 (04) : 355 - 369
  • [27] Multi-task deep learning for multi-parameter elastic inversion
    Li, Duo
    Jiang, Peng
    Yang, Senlin
    Zhang, Fengkai
    [J]. ACTA GEOPHYSICA, 2025, : 2443 - 2460
  • [28] Associating Multi-Modal Brain Imaging Phenotypes and Genetic Risk Factors via a Dirty Multi-Task Learning Method
    Du, Lei
    Liu, Fang
    Liu, Kefei
    Yao, Xiaohui
    Risacher, Shannon L.
    Han, Junwei
    Saykin, Andrew J.
    Shen, Li
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (11) : 3416 - 3428
  • [29] Driver multi-task emotion recognition network based on multi-modal facial video analysis
    Xiang, Guoliang
    Yao, Song
    Wu, Xianhui
    Deng, Hanwen
    Wang, Guojie
    Liu, Yu
    Li, Fan
    Peng, Yong
    [J]. PATTERN RECOGNITION, 2025, 161
  • [30] Optimizing Airbnb Search Journey with Multi-task Learning
    Tan, Chun How
    Chan, Austin
    Haldar, Malay
    Tang, Jie
    Liu, Xin
    Abdool, Mustafa
    Gao, Huiji
    He, Liwei
    Katariya, Sanjeev
    [J]. PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 4872 - 4881