Multi-Scale Explicit Matching and Mutual Subject Teacher Learning for Generalizable Person Re-Identification

被引：1

作者：

Chen, Kaixiang ^{[1
]}

Fang, Pengfei ^{[2
]}

Ye, Zi ^{[1
]}

Zhang, Liyan ^{[1
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China

[2] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Person re-identification; domain generalization; multi-scale; mutual-teacher;

D O I：

10.1109/TCSVT.2024.3382322

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Domain generalization in person re-identification (DG-ReID) stands out as the most challenging task and practically important branch in the ReID field, which enables the direct deployment of pre-trained models in unseen and real scenarios. Recent works have made significant efforts in this task via the image-matching paradigm, which searches for the local correspondences in the feature maps. A common practice of employing pixel-wise matching is typically used to ensure efficient matching. This, however, makes the matching susceptible to deviations caused by identity-irrelevant pixel features. On the other hand, patch-wise matching also demonstrates that it will disregard the spatial orientation of pedestrians and amplify the impact of noise. To address the mentioned issues, this paper proposes the Multi-Scale Query-Adaptive Convolution (QAConv-MS) framework, which encodes patches in the feature maps to pixels using template kernels of various scales. This enables the matching process to enjoy broader receptive fields and robustness to orientations and noises. To stabilize the matching process and facilitate the independent learning of each sub-kernel within the template kernels to capture diverse local patterns, we propose the OrthoGonal Norm (OGNorm), which consists of two orthogonal normalizations. We also present Mutual Subject Teacher Learning (MSTL) to address the potential issues of overconfidence and overfitting in the model. MSTL allows two models to individually select the most challenging data for training, resulting in more dependable soft labels that can provide mutual supervision. Extensive experiments conducted in both single-source and multi-source setups offer compelling evidence of our framework's generalization and competitiveness.

引用

页码：8881 / 8895

页数：15

共 71 条

[1]

Shitrit H.B., Berclaz J., Fleuret F., Fua P., Multi-commodity network flow for tracking multiple people, IEEE Trans. Pattern Anal. Mach. Intell., 36, 8, pp. 1614-1627, (2014)

[2]

Li Y., Et al., Deep learning for LiDAR point clouds in autonomous driving: A review, IEEE Trans. Neural Netw. Learn. Syst., 32, 8, pp. 3412-3432, (2021)

[3]

Shu X., Yang J., Yan R., Song Y., Expansion-squeeze-excitation fusion network for elderly activity recognition, IEEE Trans. Circuits Syst. Video Technol., 32, 8, pp. 5281-5292, (2022)

[4]

He S., Luo H., Wang P., Wang F., Li H., Jiang W., TransReID: Transformer-based object re-identification, Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 15013-15022, (2021)

[5]

Wang H., Shen J., Liu Y., Gao Y., Gavves E., NFormer: Robust person re-identification with neighbor transformer, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 7287-7297, (2022)

[6]

Ning X., Gong K., Li W., Zhang L., Bai X., Tian S., Feature refinement and filter network for person re-identification, IEEE Trans. Circuits Syst. Video Technol., 31, 9, pp. 3391-3402, (2020)

[7]

Liu M., Qu L., Nie L., Liu M., Duan L., Chen B., Iterative local-global collaboration learning towards one-shot video person re-identification, IEEE Trans. Image Process., 29, pp. 9360-9372, (2020)

[8]

Ye M., Shen J., Lin G., Xiang T., Shao L., Hoi S.C.H., Deep learning for person re-identification: A survey and outlook, IEEE Trans. Pattern Anal. Mach. Intell., 44, 6, pp. 2872-2893, (2022)

[9]

Liu M., Bian Y., Liu Q., Wang X., Wang Y., Weakly supervised tracklet association learning with video labels for person re-identification, IEEE Trans. Pattern Anal. Mach. Intell., early access, pp. 1-13, (2023)

[10]

Yu H.-X., Wu A., Zheng W.-S., Cross-view asymmetric metric learning for unsupervised person re-identification, Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp. 994-1002, (2017)

← 1 2 3 4 5 6 7 8 →