CLCP: Realtime Text-Image Retrieval for Retailing via Pre-trained Clustering and Priority Queue

被引：0

作者：

Zhang, Shuyang ^{[1
]}

Wei, Liangwu ^{[1
]}

Wang, Qingyu ^{[1
]}

Wei, Yuntao ^{[1
]}

Song, Yanzhi ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Peoples R China

来源：

PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024 | 2024年

基金：

国家重点研发计划;

关键词：

text-image retrieval; contrastive loss; vision-language model;

D O I：

10.1145/3652583.3657608

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Real-time matching between customer demands and product information via text-image retrieval remains a fundamental problem in intelligent retailing. However, this process involves challenges covering data quality, multi-modal retrieval strategies and performing efficiency. To alleviate the case, we propose a cross-modality retrieval pipeline leveraging contrastive loss and a novel sampling strategy. We also address text-image retrieval as a two-stage process, involving unsupervised clustering and contrastive feature representation. Additionally, we create an image-caption matching dataset by expanding the Grocery Store Dataset using a fundamental visual-language model. Our experiments demonstrate the effectiveness of our method on both an expanded new dataset and the well-known cross-modality retrieval benchmark, Flicker30k.

引用

页码：1089 / 1093

页数：5