Unified Conditional Image Generation for Visible-Infrared Person Re-Identification

被引:9
作者
Pan, Honghu [1 ]
Pei, Wenjie [1 ]
Li, Xin [2 ]
He, Zhenyu [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518066, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Image synthesis; Pedestrians; Diffusion models; Generators; Noise reduction; Generative adversarial networks; Visible-infrared person re-identification; diffusion probabilistic model; adversarial training;
D O I
10.1109/TIFS.2024.3426335
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper proposes a unified multi-modal image generation method to address two critical challenges in visible-infrared (VI) person re-identification (ReID): the insufficiency of training samples and the large cross-modality discrepancy. To be specific, we propose to generate cross-modal and middle-modal images to explicitly reduce the modality discrepancy, and generate intra-modal images to serve as training samples for datasets augmentation. To this end, we adapt the conditional diffusion model for multi-modal image generation. The condition includes a binary modality indicator and modal-irrelative pedestrian contour to control the target modality and pedestrian identity, respectively. For the intra-modality and cross-modality image generation, we modify the structure of UNet to take as input the conditions, and estimate the conditional probability density by optimizing its variational lower bound. Furthermore, we devise modal discriminators and adversarial training strategies to achieve modality alignment. The middle-modality image generation method shares the same network architecture with intra- and cross-modality generation, but has specific training objectives. We define the middle modality as the distribution equidistant from the visible modality and infrared modality. We employ the adversarial training to measure the distance from the visible or infrared modality to the middle modality, and thus minimize the difference between these two adversarial losses, serving as an equidistant constraint. Experimental results on SYSU-MM01 and RegDB demonstrate the effectiveness and generalization of the intra-modality, cross-modality, and middle-modality image generation.
引用
收藏
页码:9026 / 9038
页数:13
相关论文
共 61 条
[1]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[2]   Structure-Aware Positional Transformer for Visible-Infrared Person Re-Identification [J].
Chen, Cuiqun ;
Ye, Mang ;
Qi, Meibin ;
Wu, Jingjing ;
Jiang, Jianguo ;
Lin, Chia-Wen .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :2352-2364
[3]   Neural Feature Search for RGB-Infrared Person Re-Identification [J].
Chen, Yehansen ;
Wan, Lin ;
Li, Zhihang ;
Jing, Qianyan ;
Sun, Zongyuan .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :587-597
[4]   DMA: Dual Modality-Aware Alignment for Visible-Infrared Person Re-Identification [J].
Cui, Zhenyu ;
Zhou, Jiahuan ;
Peng, Yuxin .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 :2696-2708
[5]  
Dai PY, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P677
[6]  
Dhariwal P, 2021, ADV NEUR IN, V34
[7]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[8]   Pose-guided adversarial video prediction for image-to-video person re-identification [J].
He, Yunqi ;
Chen, Liqiu ;
Pan, Honghu .
IET IMAGE PROCESSING, 2023, 17 (14) :4000-4013
[9]  
Hermans A, 2017, Arxiv, DOI arXiv:1703.07737
[10]  
Heusel M, 2017, ADV NEUR IN, V30