Continual learning for cross-modal image-text retrieval based on domain-selective attention

被引:9
作者
Yang, Rui [1 ]
Wang, Shuang [1 ,2 ]
Gu, Yu [1 ]
Wang, Jihui [1 ]
Sun, Yingzhi [1 ]
Zhang, Huan [1 ]
Liao, Yu [1 ]
Jiao, Licheng [1 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Xian 710071, Shaanxi, Peoples R China
[2] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Cross-modal retrieval; Continual learning; Attention; Weight regularization;
D O I
10.1016/j.patcog.2024.110273
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross -modal image-text retrieval (CMITR) has been a high-value research topic for more than a decade. In most of the previous studies, the data for all tasks are trained as a single set. However, in reality, a more likely scenario is that the dataset has multiple tasks and trains them in sequence. The consequence is the limited ability to memorize the old task once a new task arrives; in other words, catastrophic forgetting. To solve this issue, this paper proposes a novel continual learning for cross-modal image-text retrieval (CLCMR) method to alleviate catastrophic forgetting. We construct a multilayer domain-selective attention (MDSA) based network to obtain knowledge from task-relevant and domain-specific attention levels. Moreover, a memory factor has been designed to achieve weight regularization, and a novel memory loss function is utilized to constrain MDSA. The extensive experimental results from multiple datasets (Wikipedia, Pascal Sentence, and PKU XMedianet datasets) demonstrate that CLCMR can effectively alleviate catastrophic forgetting and achieve a superior continual learning ability compared with the state-of-the-art methods.
引用
收藏
页数:13
相关论文
共 55 条
[1]  
Rusu AA, 2016, Arxiv, DOI arXiv:1606.04671
[2]   Expert Gate: Lifelong Learning with a Network of Experts [J].
Aljundi, Rahaf ;
Chakravarty, Punarjay ;
Tuytelaars, Tinne .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :7120-7129
[3]  
Andrew G., 2013, INT C MACHINE LEARNI, V28, P1247
[4]   Multimodal Machine Learning: A Survey and Taxonomy [J].
Baltrusaitis, Tadas ;
Ahuja, Chaitanya ;
Morency, Louis-Philippe .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443
[5]   On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval [J].
Costa Pereira, Jose ;
Coviello, Emanuele ;
Doyle, Gabriel ;
Rasiwasia, Nikhil ;
Lanckriet, Gert R. G. ;
Levy, Roger ;
Vasconcelos, Nuno .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) :521-535
[6]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7]  
Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[8]  
Elhoseiny Mohamed, 2019, WORKSHOP MULTITASK L
[9]   Cross-modal Retrieval with Correspondence Autoencoder [J].
Feng, Fangxiang ;
Wang, Xiaojie ;
Li, Ruifan .
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :7-16
[10]  
Fernando C, 2017, arXiv