Cascaded Cross-modal Alignment for Visible-Infrared Person Re-Identification

被引：1

作者：

Li, Zhaohui ^{[1
]}

Wang, Qiangchang ^{[1
]}

Chen, Lu ^{[1
]}

Zhang, Xinxin ^{[1
]}

Yin, Yilong ^{[1
]}

机构：

[1] Shandong Univ, Sch Software, Jinan 250101, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 305卷

关键词：

Masking on frequency; Prototypes; Data augmentation; Visible-Infrared Person Re-Identification; Cross-modal Alignment;

D O I：

10.1016/j.knosys.2024.112585

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visible-Infrared Person Re-Identification faces significant challenges due to cross-modal and intra-modal variations. Although existing methods explore semantic alignment from various angles, severe distribution shifts in heterogeneous data limit the effectiveness of single-level alignment approaches. To address this issue, we propose a Cascaded Cross-modal Alignment (CCA) framework that gradually eliminates distribution discrepancies and aligns semantic features from three complementary perspectives in a cascaded manner. First, at the input-level, we propose a Channel-Spatial Recombination (CSR) strategy that strategically reorganizes and preserves crucial details from channel and spatial dimensions to diminish visual discrepancies between modalities, thereby narrowing the modality gap in input images. Second, at the frequency-level, we introduce a Low Frequency Masking (LFM) module to emphasize global details that CSR might overlook by randomly masking low-frequency information, thus driving comprehensive alignment of identity semantics. Third, at the part-level, we design a Prototype-based Semantic Refinement (PSR) module to refine fine-grained features and mitigate the impact of irrelevant areas in LFM. It accurately aligns body parts and enhances semantic consistency guided by global discriminative clues from LFM and flipped views with pose variations. Comprehensive experimental results on the SYSU-MM01 and RegDB datasets demonstrate the superiority of our proposed CCA.

引用

页数：12

共 64 条

[31] Mind the Gap: Learning Modality-Agnostic Representations With a Cross-Modality UNet [J].

Niu, Xin ;

Li, Enyi ;

Liu, Jinchao ;

Wang, Yan ;

Osadchy, Margarita ;

Fang, Yongchun .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 :655-670

[32]

Paszke A, 2019, ADV NEUR IN, V32

[33]

Qiu L., 2024, P AAAI C ART INT, DOI [10.48550/arXiv.2312.07853, DOI 10.48550/ARXIV.2312.07853]

[34] A survey on Image Data Augmentation for Deep Learning [J].

Shorten, Connor ;

Khoshgoftaar, Taghi M. .

JOURNAL OF BIG DATA, 2019, 6 (01)

[35] Not All Pixels Are Matched: Dense Contrastive Learning for Cross-Modality Person Re-Identification [J].

Sun, Hanzhe ;

Liu, Jun ;

Zhang, Zhizhong ;

Wang, Chengjie ;

Qu, Yanyun ;

Xie, Yuan ;

Ma, Lizhuang .

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :5333-5341

[36] G2DA: Geometry-guided dual-alignment learning for RGB-infrared person re-identification [J].

Wan, Lin ;

Sun, Zongyuan ;

Jing, Qianyan ;

Chen, Yehansen ;

Lu, Lijing ;

Li, Zhihang .

PATTERN RECOGNITION, 2023, 135

[37]

Wang GA, 2020, AAAI CONF ARTIF INTE, V34, P12144

[38] Non-local Neural Networks [J].

Wang, Xiaolong ;

Girshick, Ross ;

Gupta, Abhinav ;

He, Kaiming .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7794-7803

[39] Learning to Reduce Dual-level Discrepancy for Infrared-Visible Person Re-identification [J].

Wang, Zhixiang ;

Wang, Zheng ;

Zheng, Yinqiang ;

Chuang, Yung-Yu ;

Satoh, Shin'ichi .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :618-626

[40] RBDF: Reciprocal Bidirectional Framework for Visible Infrared Person Reidentification [J].

Wei, Ziyu ;

Yang, Xi ;

Wang, Nannan ;

Gao, Xinbo .

IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (10) :10988-10998

← 1 2 3 4 5 6 7 →