VEMO: A Versatile Elastic Multi-modal Model for Search-Oriented Multi-task Learning

被引：0

作者：

Fei, Nanyi ^{[1
]}

Jiang, Hao ^{[2
]}

Lu, Haoyu ^{[3
]}

Long, Jinqiang ^{[3
]}

Dai, Yanqi ^{[3
]}

Fan, Tuo ^{[2
]}

Cao, Zhao ^{[2
]}

Lu, Zhiwu ^{[3
]}

机构：

[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China

[2] Huawei Poisson Lab, Hangzhou, Zhejiang, Peoples R China

[3] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China

来源：

ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I | 2024年 / 14608卷

基金：

中国国家自然科学基金;

关键词：

multi-modal model; multi-task learning; cross-modal search;

D O I：

10.1007/978-3-031-56027-9_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cross-modal search is one fundamental task in multi-modal learning, but there is hardly any work that aims to solve multiple cross-modal search tasks at once. In this work, we propose a novel Versatile Elastic Multi-mOdal (VEMO) model for search-oriented multi-task learning. VEMO is versatile because we integrate cross-modal semantic search, named entity recognition, and scene text spotting into a unified framework, where the latter two can be further adapted to entity- and character-based image search tasks. VEMO is also elastic because we can freely assemble sub-modules of our flexible network architecture for corresponding tasks. Moreover, to give more choices on the effect-efficiency trade-off when performing cross-modal semantic search, we place multiple encoder exits. Experimental results show the effectiveness of our VEMO with only 37.6% network parameters compared to those needed for uni-task training. Further evaluations on entity- and character-based image search tasks also validate the superiority of search-oriented multi-task learning.

引用

页码：56 / 72

页数：17

共 50 条

[21] Multi-task learning to rank for web search
Chang, Yi
Bai, Jing
Zhou, Ke
Xue, Gui-Rong
Zha, Hongyuan
Zheng, Zhaohui
[J]. PATTERN RECOGNITION LETTERS, 2012, 33 (02) : 173 - 181
[22] MM-DAG: Multi-task DAG Learning for Multi-modal Data - with Application for Traffic Congestion Analysis
Lan, Tian
Li, Ziyue
Li, Zhishuai
Bai, Lei
Li, Man
Tsung, Fugee
Ketter, Wolfgang
Zhao, Rui
Zhang, Chen
[J]. PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 1188 - 1199
[23] Multi-Task Multi-modal Semantic Hashing for Web Image Retrieval with Limited Supervision
Xie, Liang
Zhu, Lei
Cheng, Zhiyong
[J]. MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 465 - 477
[24] M3T-LM: A multi-modal multi-task learning model for jointly predicting patient length of stay and mortality
Chen, Junde
Li, Qing
Liu, Feng
Wen, Yuxin
[J]. Computers in Biology and Medicine, 2024, 183
[25] Multi-task gradient descent for multi-task learning
Lu Bai
Yew-Soon Ong
Tiantian He
Abhishek Gupta
[J]. Memetic Computing, 2020, 12 : 355 - 369
[26] Multi-task gradient descent for multi-task learning
Bai, Lu
Ong, Yew-Soon
He, Tiantian
Gupta, Abhishek
[J]. MEMETIC COMPUTING, 2020, 12 (04) : 355 - 369
[27] Multi-task deep learning for multi-parameter elastic inversion
Li, Duo
Jiang, Peng
Yang, Senlin
Zhang, Fengkai
[J]. ACTA GEOPHYSICA, 2025, : 2443 - 2460
[28] Associating Multi-Modal Brain Imaging Phenotypes and Genetic Risk Factors via a Dirty Multi-Task Learning Method
Du, Lei
Liu, Fang
Liu, Kefei
Yao, Xiaohui
Risacher, Shannon L.
Han, Junwei
Saykin, Andrew J.
Shen, Li
[J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (11) : 3416 - 3428
[29] Driver multi-task emotion recognition network based on multi-modal facial video analysis
Xiang, Guoliang
Yao, Song
Wu, Xianhui
Deng, Hanwen
Wang, Guojie
Liu, Yu
Li, Fan
Peng, Yong
[J]. PATTERN RECOGNITION, 2025, 161
[30] Optimizing Airbnb Search Journey with Multi-task Learning
Tan, Chun How
Chan, Austin
Haldar, Malay
Tang, Jie
Liu, Xin
Abdool, Mustafa
Gao, Huiji
He, Liwei
Katariya, Sanjeev
[J]. PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 4872 - 4881

← 1 2 3 4 5 →