CycAs: Self-supervised Cycle Association for Learning Re-identifiable Descriptions

被引:68
作者
Wang, Zhongdao [1 ]
Zhang, Jingwei [1 ]
Zheng, Liang [2 ]
Liu, Yixuan [1 ]
Sun, Yifan [3 ]
Li, Yali [1 ]
Wang, Shengjin [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
[2] Australian Natl Univ, Canberra, Australia
[3] MEGVII Technol, Beijing, Peoples R China
来源
COMPUTER VISION - ECCV 2020, PT XI | 2020年 / 12356卷
关键词
Self-supervised; Cycle consistency; person re-ID;
D O I
10.1007/978-3-030-58621-8_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a self-supervised learning method for the person re-identification (re-ID) problem, where existing unsupervised methods usually rely on pseudo labels, such as those from video tracklets or clustering. A potential drawback of using pseudo labels is that errors may accumulate and it is challenging to estimate the number of pseudo IDs. We introduce a different unsupervised method that allows us to learn pedestrian embeddings from raw videos, without resorting to pseudo labels. The goal is to construct a self-supervised pretext task that matches the person re-ID objective. Inspired by the data association concept in multi-object tracking, we propose the Cycle Association (CycAs) task: after performing data association between a pair of video frames forward and then backward, a pedestrian instance is supposed to be associated to itself. To fulfill this goal, the model must learn a meaningful representation that can well describe correspondences between instances in frame pairs. We adapt the discrete association process to a differentiable form, such that end-to-end training becomes feasible. Experiments are conducted in two aspects: We first compare our method with existing unsupervised re-ID methods on seven benchmarks and demonstrate CycAs' superiority. Then, to further validate the practical value of CycAs in real-world applications, we perform training on self-collected videos and report promising performance on standard test sets.
引用
收藏
页码:72 / 88
页数:17
相关论文
共 45 条
[1]   Deep Clustering for Unsupervised Learning of Visual Features [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Joulin, Armand ;
Douze, Matthijs .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156
[2]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[3]   Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification [J].
Deng, Weijian ;
Zheng, Liang ;
Ye, Qixiang ;
Kang, Guoliang ;
Yang, Yi ;
Jiao, Jianbin .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :994-1003
[4]   Unsupervised Visual Representation Learning by Context Prediction [J].
Doersch, Carl ;
Gupta, Abhinav ;
Efros, Alexei A. .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1422-1430
[5]   Unsupervised Person Re-identification: Clustering and Fine-tuning [J].
Fan, Hehe ;
Zheng, Liang ;
Yan, Chenggang ;
Yang, Yi .
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (04)
[6]  
Gidaris S., 2018, P INT C LEARN REPR
[7]   Digging Into Self-Supervised Monocular Depth Estimation [J].
Godard, Clement ;
Mac Aodha, Oisin ;
Firman, Michael ;
Brostow, Gabriel .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3827-3837
[8]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[9]  
Hirzer M, 2011, LECT NOTES COMPUT SC, V6688, P91, DOI 10.1007/978-3-642-21227-7_9
[10]  
Lai Z., 2019, BMVC