CLCP: Realtime Text-Image Retrieval for Retailing via Pre-trained Clustering and Priority Queue

被引:0
作者
Zhang, Shuyang [1 ]
Wei, Liangwu [1 ]
Wang, Qingyu [1 ]
Wei, Yuntao [1 ]
Song, Yanzhi [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
来源
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024 | 2024年
基金
国家重点研发计划;
关键词
text-image retrieval; contrastive loss; vision-language model;
D O I
10.1145/3652583.3657608
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-time matching between customer demands and product information via text-image retrieval remains a fundamental problem in intelligent retailing. However, this process involves challenges covering data quality, multi-modal retrieval strategies and performing efficiency. To alleviate the case, we propose a cross-modality retrieval pipeline leveraging contrastive loss and a novel sampling strategy. We also address text-image retrieval as a two-stage process, involving unsupervised clustering and contrastive feature representation. Additionally, we create an image-caption matching dataset by expanding the Grocery Store Dataset using a fundamental visual-language model. Our experiments demonstrate the effectiveness of our method on both an expanded new dataset and the well-known cross-modality retrieval benchmark, Flicker30k.
引用
收藏
页码:1089 / 1093
页数:5
相关论文
empty
未找到相关数据