From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search

被引：0

作者：

Sun, Jintao ^{[1
]}

Fei, Hao ^{[2
]}

Ding, Gangyi ^{[1
]}

Zheng, Zhedong ^{[3
,4
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China

[2] Natl Univ Singapore, Sch Comp, Singapore, Singapore

[3] Univ Macau, Fac Sci & Technol, Macau, Peoples R China

[4] Univ Macau, Inst Collaborat Innovat, Macau, Peoples R China

来源：

PROCEEDINGS OF THE ACM WEB CONFERENCE 2025, WWW 2025 | 2025年

关键词：

Text-based Person Search; Data-centric Learning; Low-Rank Adaptation; Visual-language Pre-training; NETWORK;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In text-based person search endeavors, data generation has emerged as a prevailing practice, addressing concerns over privacy preservation and the arduous task of manual annotation. Although the number of synthesized data can be infinite in theory, the scientific conundrum persists that how much generated data optimally fuels subsequent model training. We observe that only a subset of the data in these constructed datasets plays a decisive role. Therefore, we introduce a new Filtering-WoRA paradigm, which contains a filtering algorithm to identify this crucial data subset and WoRA (Weighted Low-Rank Adaptation) learning strategy for light fine-tuning. The filtering algorithm is based on the cross-modality relevance to remove the lots of coarse matching synthesis pairs. As the number of data decreases, we do not need to fine-tune the entire model. Therefore, we propose a WoRA learning strategy to efficiently update a minimal portion of model parameters. WoRA streamlines the learning process, enabling heightened efficiency in extracting knowledge from fewer, yet potent, data instances. Extensive experimentation validates the efficacy of pretraining, where our model achieves advanced and efficient retrieval performance on challenging real-world benchmarks. Notably, on the CUHK-PEDES dataset, we have achieved a competitive mAP of 67.02% while reducing model training time by 19.82%.

引用

页码：2341 / 2351

页数：11

共 90 条

[1]

Aggarwal S, 2020, IEEE WINT CONF APPL, P2606, DOI [10.1109/wacv45572.2020.9093640, 10.1109/WACV45572.2020.9093640]

[2] Person Re-Identification without Identification via Event Anonymization [J].

Ahmad, Shafiq ;

Morerio, Pietro ;

Del Bue, Alessio .

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :11098-11107

[3] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].

Anderson, Peter ;

He, Xiaodong ;

Buehler, Chris ;

Teney, Damien ;

Johnson, Mark ;

Gould, Stephen ;

Zhang, Lei .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086

[4]

Bai Y, 2023, PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, P555

[5]

Brennan S., 2007, P IEEE INT WORKSH PE, V3, P1

[6]

Cao M, 2023, Arxiv, DOI arXiv:2308.10045

[7] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[8] Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association [J].

Chen, Dapeng ;

Li, Hongsheng ;

Liu, Xihui ;

Shen, Yantao ;

Shao, Jing ;

Yuan, Zejian ;

Wang, Xiaogang .

COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :56-73

[9] Improving Text-based Person Search by Spatial Matching and Adaptive Threshold [J].

Chen, Tianlang ;

Xu, Chenliang ;

Luo, Jiebo .

2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :1879-1887

[10] UNITER: UNiversal Image-TExt Representation Learning [J].

Chen, Yen-Chun ;

Li, Linjie ;

Yu, Licheng ;

El Kholy, Ahmed ;

Ahmed, Faisal ;

Gan, Zhe ;

Cheng, Yu ;

Liu, Jingjing .

COMPUTER VISION - ECCV 2020, PT XXX, 2020, 12375 :104-120

← 1 2 3 4 5 6 7 8 9 →