Complementary expert balanced learning for long-tail cross-modal retrieval

被引：0

作者：

Liu, Peifang ^{[1
]}

Liu, Xueliang ^{[1
,2
]}

机构：

[1] Hefei Univ Technol, Hefei, Peoples R China

[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 02期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Cross-modal retrieval; Online distillation; Long-tailed learning;

D O I：

10.1007/s00530-024-01317-9

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cross-modal retrieval aims to project the high-dimensional cross-model data to a common low-dimensional space. Previous work relies on balanced dataset for training. But with the growth of massive real datasets, the long-tail phenomenon has be found in more and more datasets and how to train with those imbalanced datasets is becoming an emerging challenge. In this paper, we propose the complementary expert balanced learning for long-tail cross-modal retrieval to alleviate the impact of long-tail data. In the solution, we design a multiple experts complementary to balance the difference between image and text modalities. Separately for each expert, to find the common feature space of images and texts, we design an individual pairs loss. Moreover, a balancing process is proposed to mitigate the impact of the long tail on the retrieval accuracy of each expert network. In addition, we propose complementary online distillation to enable collaborative operation between individual experts and improve image and text matching. Each expert allows mutual learning between individual modalities, and multiple experts can complement each other to learn the feature embedding between two modalities. Finally, to address the reduction in the number of data after long-tail processing, we propose high-score retraining which also helps the network capture global and robust features with meticulous discrimination. Experimental results on widely used benchmark datasets show that the proposed method is effective in long-tail cross-modal learning.

引用

页数：11

共 59 条

[1]

Andrew G., 2013, ICML

[2]

[Anonymous], 2009, P ACM INT C IM VID R

[3] A systematic study of the class imbalance problem in convolutional neural networks [J].

Buda, Mateusz ;

Maki, Atsuto ;

Mazurowski, Maciej A. .

NEURAL NETWORKS, 2018, 106 :249-259

[4] ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot [J].

Cai, Jiarui ;

Wang, Yizhou ;

Hwang, Jenq-Neng .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :112-121

[5]

Cao KD, 2019, ADV NEUR IN, V32

[6] Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts [J].

Changpinyo, Soravit ;

Sharma, Piyush ;

Ding, Nan ;

Soricut, Radu .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :3557-3567

[7]

Chen DF, 2020, AAAI CONF ARTIF INTE, V34, P3430

[8]

Chen Y., 2015, Convolutional Neural Network for Sentence Classification

[9] Parametric Contrastive Learning [J].

Cui, Jiequan ;

Zhong, Zhisheng ;

Liu, Shu ;

Yu, Bei ;

Jia, Jiaya .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :695-704

[10] Class-Balanced Loss Based on Effective Number of Samples [J].

Cui, Yin ;

Jia, Menglin ;

Lin, Tsung-Yi ;

Song, Yang ;

Belongie, Serge .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9260-9269

← 1 2 3 4 5 6 →