Source-Free Image-Text Matching via Uncertainty-Aware Learning

被引：0

作者：

Tian, Mengxiao ^{[1
,2
]}

Yang, Shuo ^{[3
]}

Wu, Xinxiao ^{[1
,2
]}

Jia, Yunde ^{[3
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci, Beijing Lab Intelligent Informat Technol, Beijing 100081, Peoples R China

[2] Shenzhen MSU BIT Univ, Guangdong Prov Lab Machine Percept & Intelligent C, Shenzhen 518172, Peoples R China

[3] Shenzhen MSU BIT Univ, Guangdong Prov Lab Machine Percept & Intelligent C, Shenzhen 518172, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

关键词：

Adaptation models; Uncertainty; Noise measurement; Data models; Training; Noise; Visualization; Measurement uncertainty; Computational modeling; Testing; Image-text matching; source-free adaptation; uncertainty-aware learning;

D O I：

10.1109/LSP.2024.3488521

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

When applying a trained image-text matching model to a new scenario, the performance may largely degrade due to domain shift, which makes it impractical in real-world applications. In this paper, we make the first attempt on adapting the image-text matching model well-trained on a labeled source domain to an unlabeled target domain in the absence of source data, namely, source-free image-text matching. This task is challenging since it has no direct access to the source data when learning to reduce the doma in shift. To address this challenge, we propose a simple yet effective method that introduces uncertainty-aware learning to generate high-quality pseudo-pairs of image and text for target adaptation. Specifically, starting with using the pre-trained source model to retrieve several top-ranked image-text pairs from the target domain as pseudo-pairs, we then model uncertainty of each pseudo-pair by calculating the variance of retrieved texts (resp. images) given the paired image (resp. text) as query, and finally incorporate the uncertainty into an objective function to down-weight noisy pseudo-pairs for better training, thereby enhancing adaptation. This uncertainty-aware training approach can be generally applied on all existing models. Extensive experiments on the COCO and Flickr30K datasets demonstrate the effectiveness of the proposed method.

引用

页码：3059 / 3063

页数：5

共 50 条

[21] Uncertainty-Aware Sparse Transformer Network for Single-Image Deraindrop
Fu, Bo
Jiang, Yunyun
Wang, Di
Gao, Jiaxin
Wang, Cong
Li, Ximing
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
[22] Incremental Pedestrian Attribute Recognition via Dual Uncertainty-Aware Pseudo-Labeling
Li, Da
Zhang, Zhang
Shan, Caifeng
Wang, Liang
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 2622 - 2636
[23] Bi-Attention enhanced representation learning for image-text matching
Tian, Yumin
Ding, Aqiang
Wang, Di
Luo, Xuemei
Wan, Bo
Wang, Yifeng
PATTERN RECOGNITION, 2023, 140
[24] Self-attention guided representation learning for image-text matching
Qi, Xuefei
Zhang, Ying
Qi, Jinqing
Lu, Huchuan
NEUROCOMPUTING, 2021, 450 : 143 - 155
[25] Learning Source-Free Domain Adaptation for Infrared Small Target Detection
Jin, Hongxu
Chen, Baiyang
Lu, Qianwen
Tao, Qingchuan
Li, Yongxiang
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 1121 - 1125
[26] Learning Fragment Self-Attention Embeddings for Image-Text Matching
Wu, Yiling
Wang, Shuhui
Song, Guoli
Huang, Qingming
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2088 - 2096
[27] Cross-modal multi-relationship aware reasoning for image-text matching
Jin Zhang
Xiaohai He
Linbo Qing
Luping Liu
Xiaodong Luo
Multimedia Tools and Applications, 2022, 81 : 12005 - 12027
[28] Active Learning of Discrete-Time Dynamics for Uncertainty-Aware Model Predictive Control
Saviolo, Alessandro
Frey, Jonathan
Rathod, Abhishek
Diehl, Moritz
Loianno, Giuseppe
IEEE TRANSACTIONS ON ROBOTICS, 2024, 40 : 1273 - 1291
[29] Context-Aware Multi-View Summarization Network for Image-Text Matching
Qu, Leigang
Liu, Meng
Cao, Da
Nie, Liqiang
Tian, Qi
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1047 - 1055
[30] Cross-modal multi-relationship aware reasoning for image-text matching
Zhang, Jin
He, Xiaohai
Qing, Linbo
Liu, Luping
Luo, Xiaodong
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (09) : 12005 - 12027

← 1 2 3 4 5 →