Large-Scale Pre-training for Person Re-identification with Noisy Labels

被引:48
作者
Fu, Dengpan [1 ]
Chen, Dongdong [3 ]
Yang, Hao [2 ]
Bao, Jianmin [2 ]
Yuan, Lu [3 ]
Zhang, Lei [4 ]
Li, Houqiang [1 ]
Wen, Fang [2 ]
Chen, Dong [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Microsoft Res, Redmond, WA 98052 USA
[3] Microsoft Cloud AI, Orlando, FL USA
[4] IDEA, Hangzhou, Peoples R China
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52688.2022.00251
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper aims to address the problem of pre-training for person re-identification (Re-ID) with noisy labels. To setup the pre-training task, we apply a simple online multi-object tracking system on raw videos of an existing unlabeled Re-ID dataset "LUPerson" and build the Noisy Labeled variant called "LUPerson-NL". Since theses ID labels automatically derived from tracklets inevitably contain noises, we develop a large-scale Pre-training framework utilizing Noisy Labels (PNL), which consists of three learning modules: supervised Re-ID learning, prototype-based contrastive learning, and label-guided contrastive learning. In principle, joint learning of these three modules not only clusters similar examples to one prototype, but also rectifies noisy labels based on the prototype assignment. We demonstrate that learning directly from raw videos is a promising alternative for pre-training, which utilizes spatial and temporal correlations as weak supervision. This simple pre-training task provides a scalable way to learn SOTA Re-ID representations from scratch on "LUPerson-NL" without bells and whistles. For example, by applying on the same supervised Re-ID method MGN, our pre-trained model improves the mAP over the unsupervised pre-training counterpart by 5.7%, 2.2%, 2.3% on CUHK03, DukeMTMC, and MSMT17 respectively. Under the small-scale or few-shot setting, the performance gain is even more significant, suggesting a better transferability of the learned representation. Code is available at https://github.com/DengpanFu/LVPerson-NL.
引用
收藏
页码:2466 / 2476
页数:11
相关论文
共 63 条
[11]   Batch DropBlock Network for Person Re-identification and Beyond [J].
Dai, Zuozhuo ;
Chen, Mingqiang ;
Gu, Xiaodong ;
Zhu, Siyu ;
Tan, Ping .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3690-3700
[12]   Fast Feature Pyramids for Object Detection [J].
Dollar, Piotr ;
Appel, Ron ;
Belongie, Serge ;
Perona, Pietro .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (08) :1532-1545
[13]   Object Detection with Discriminatively Trained Part-Based Models [J].
Felzenszwalb, Pedro F. ;
Girshick, Ross B. ;
McAllester, David ;
Ramanan, Deva .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) :1627-1645
[14]  
Fu D., 2021, P IEEE C COMP VIS PA, P14750
[15]   Improving Person Re-Identification With Iterative Impression Aggregation [J].
Fu, Dengpan ;
Xin, Bo ;
Wang, Jingdong ;
Chen, Dongdong ;
Bao, Jianmin ;
Hua, Gang ;
Li, Houqiang .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :9559-9571
[16]  
Ge YX, 2020, ADV NEUR IN, V33
[17]  
Ge Yixiao, 2019, INT C LEARN REPR
[18]   Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features [J].
Gray, Douglas ;
Tao, Hai .
COMPUTER VISION - ECCV 2008, PT I, PROCEEDINGS, 2008, 5302 :262-275
[19]   Temporal Knowledge Propagation for Image-to-Video Person Re-identification [J].
Gu, Xinqian ;
Ma, Bingpeng ;
Chang, Hong ;
Shan, Shiguang ;
Chen, Xilin .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9646-9655
[20]  
GuangrunWang GuangcongWang, 2020, IEEE T NEURAL NETWOR, V32, P2142