Cascaded Cross-modal Alignment for Visible-Infrared Person Re-Identification

被引:1
作者
Li, Zhaohui [1 ]
Wang, Qiangchang [1 ]
Chen, Lu [1 ]
Zhang, Xinxin [1 ]
Yin, Yilong [1 ]
机构
[1] Shandong Univ, Sch Software, Jinan 250101, Peoples R China
关键词
Masking on frequency; Prototypes; Data augmentation; Visible-Infrared Person Re-Identification; Cross-modal Alignment;
D O I
10.1016/j.knosys.2024.112585
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visible-Infrared Person Re-Identification faces significant challenges due to cross-modal and intra-modal variations. Although existing methods explore semantic alignment from various angles, severe distribution shifts in heterogeneous data limit the effectiveness of single-level alignment approaches. To address this issue, we propose a Cascaded Cross-modal Alignment (CCA) framework that gradually eliminates distribution discrepancies and aligns semantic features from three complementary perspectives in a cascaded manner. First, at the input-level, we propose a Channel-Spatial Recombination (CSR) strategy that strategically reorganizes and preserves crucial details from channel and spatial dimensions to diminish visual discrepancies between modalities, thereby narrowing the modality gap in input images. Second, at the frequency-level, we introduce a Low Frequency Masking (LFM) module to emphasize global details that CSR might overlook by randomly masking low-frequency information, thus driving comprehensive alignment of identity semantics. Third, at the part-level, we design a Prototype-based Semantic Refinement (PSR) module to refine fine-grained features and mitigate the impact of irrelevant areas in LFM. It accurately aligns body parts and enhances semantic consistency guided by global discriminative clues from LFM and flipped views with pose variations. Comprehensive experimental results on the SYSU-MM01 and RegDB datasets demonstrate the superiority of our proposed CCA.
引用
收藏
页数:12
相关论文
共 64 条
[1]  
Alehdaghi M., 2022, EUR C COMP VIS, DOI [10.1007/978-3-031-25072-948, DOI 10.1007/978-3-031-25072-948]
[2]  
Bhabatosh C., 2011, Digital Image Processing and Analysis
[3]   Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving [J].
Cai, Mu ;
Zhang, Hong ;
Huang, Huijuan ;
Geng, Qichuan ;
Li, Yixuan ;
Huang, Gao .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13910-13920
[4]   Diverse-Feature Collaborative Progressive Learning for Visible-Infrared Person Re-Identification [J].
Chan, Sixian ;
Meng, Weihao ;
Bai, Cong ;
Hu, Jie ;
Chen, Shenyong .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (05) :7754-7763
[5]   Structure-Aware Positional Transformer for Visible-Infrared Person Re-Identification [J].
Chen, Cuiqun ;
Ye, Mang ;
Qi, Meibin ;
Wu, Jingjing ;
Jiang, Jianguo ;
Lin, Chia-Wen .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :2352-2364
[6]   Enhanced Invariant Feature Joint Learning via Modality-Invariant Neighbor Relations for Cross-Modality Person Re-Identification [J].
Du, Guodong ;
Zhang, Liyan .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) :2361-2373
[7]   Modality-transfer generative adversarial network and dual-level unified latent representation for visible thermal Person re-identification [J].
Fan, Xing ;
Jiang, Wei ;
Luo, Hao ;
Mao, Weijie .
VISUAL COMPUTER, 2022, 38 (01) :279-294
[8]   Visible-Infrared Person Re-Identification via Semantic Alignment and Affinity Inference [J].
Fang, Xingye ;
Yang, Yang ;
Fu, Ying .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :11236-11245
[9]   Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification [J].
Feng, Jiawei ;
Wu, Ancong ;
Zhen, Wei-Shi .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :22752-22761
[10]   Visible-Infrared Person Re-Identification via Cross-Modality Interaction Transformer [J].
Feng, Yujian ;
Yu, Jian ;
Chen, Feng ;
Ji, Yimu ;
Wu, Fei ;
Liu, Shangdon ;
Jing, Xiao-Yuan .
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 :7647-7659