Unified Conditional Image Generation for Visible-Infrared Person Re-Identification

被引：9

作者：

Pan, Honghu ^{[1
]}

Pei, Wenjie ^{[1
]}

Li, Xin ^{[2
]}

He, Zhenyu ^{[1
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518066, Peoples R China

来源：

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY | 2024年 / 19卷

基金：

中国国家自然科学基金;

关键词：

Training; Image synthesis; Pedestrians; Diffusion models; Generators; Noise reduction; Generative adversarial networks; Visible-infrared person re-identification; diffusion probabilistic model; adversarial training;

D O I：

10.1109/TIFS.2024.3426335

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper proposes a unified multi-modal image generation method to address two critical challenges in visible-infrared (VI) person re-identification (ReID): the insufficiency of training samples and the large cross-modality discrepancy. To be specific, we propose to generate cross-modal and middle-modal images to explicitly reduce the modality discrepancy, and generate intra-modal images to serve as training samples for datasets augmentation. To this end, we adapt the conditional diffusion model for multi-modal image generation. The condition includes a binary modality indicator and modal-irrelative pedestrian contour to control the target modality and pedestrian identity, respectively. For the intra-modality and cross-modality image generation, we modify the structure of UNet to take as input the conditions, and estimate the conditional probability density by optimizing its variational lower bound. Furthermore, we devise modal discriminators and adversarial training strategies to achieve modality alignment. The middle-modality image generation method shares the same network architecture with intra- and cross-modality generation, but has specific training objectives. We define the middle modality as the distribution equidistant from the visible modality and infrared modality. We employ the adversarial training to measure the distance from the visible or infrared modality to the middle modality, and thus minimize the difference between these two adversarial losses, serving as an equidistant constraint. Experimental results on SYSU-MM01 and RegDB demonstrate the effectiveness and generalization of the intra-modality, cross-modality, and middle-modality image generation.

引用

页码：9026 / 9038

页数：13

共 61 条

[21]

Kingma DP, 2014, Arxiv, DOI [arXiv:1312.6114, DOI 10.48550/ARXIV.1312.6114]

[22]

Pan HH, 2022, Arxiv, DOI arXiv:2210.01585

[23] Pose-Aided Video-Based Person Re-Identification via Recurrent Graph Convolutional Network [J].

Pan, Honghu ;

Liu, Qiao ;

Chen, Yongyong ;

He, Yunqi ;

Zheng, Yuan ;

Zheng, Feng ;

He, Zhenyu .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) :7183-7196

[24] Toward Complete-View and High-Level Pose-Based Gait Recognition [J].

Pan, Honghu ;

Chen, Yongyong ;

Xu, Tingyang ;

He, Yunqi ;

He, Zhenyu .

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 :2104-2118

[25] Multi-granularity graph pooling for video-based person re-identification [J].

Pan, Honghu ;

Chen, Yongyong ;

He, Zhenyu .

NEURAL NETWORKS, 2023, 160 :22-33

[26] AAGCN: Adjacency-aware Graph Convolutional Network for person re-identification [J].

Pan, Honghu ;

Bai, Yang ;

He, Zhenyu ;

Zhang, Chunkai .

KNOWLEDGE-BASED SYSTEMS, 2022, 236

[27] TCDesc: Learning Topology Consistent Descriptors for Image Matching [J].

Pan, Honghu ;

Chen, Yongyong ;

He, Zhenyu ;

Meng, Fanyang ;

Fan, Nana .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) :2845-2855

[28] High-Resolution Image Synthesis with Latent Diffusion Models [J].

Rombach, Robin ;

Blattmann, Andreas ;

Lorenz, Dominik ;

Esser, Patrick ;

Ommer, Bjoern .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :10674-10685

[29] U-Net: Convolutional Networks for Biomedical Image Segmentation [J].

Ronneberger, Olaf ;

Fischer, Philipp ;

Brox, Thomas .

MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 :234-241

[30] Image Super-Resolution via Iterative Refinement [J].

Saharia, Chitwan ;

Ho, Jonathan ;

Chan, William ;

Salimans, Tim ;

Fleet, David J. ;

Norouzi, Mohammad .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) :4713-4726

← 1 2 3 4 5 6 7 →