Nystromformer based cross-modality transformer for visible-infrared person re-identification

被引：1

作者：

Mishra, Ranjit Kumar ^{[1
]}

Mondal, Arijit ^{[1
]}

Mathew, Jimson ^{[1
]}

机构：

[1] Indian Inst Technol Patna, Dept Comp Sci & Engn, Patna 801106, Bihar, India

来源：

SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期

关键词：

Nystr & ouml; mformer; Higher Order Attention; Person Re-identification; Cross-Modality Fusion;

D O I：

10.1038/s41598-025-01226-5

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Person re-identification (Re-ID) aims to accurately match individuals across different camera views, a critical task for surveillance and security applications, often under varying conditions such as illumination, pose, and background. Traditional Re-ID systems operate solely in the visible spectrum, which limits their effectiveness under varying lighting conditions and at night. To overcome these limitations, leveraging the visible-infrared (VIS-IR) domain becomes essential, as infrared imaging provides reliable information in low-light and night-time environments. However, the integration of VIS (visible) and IR (infrared) modalities introduces significant cross-modality discrepancies, posing a major challenge for feature alignment and fusion. To address this, we propose NiCTRAM: a Nystr & ouml;mformer-based Cross-Modality Transformer designed for robust VIS-IR person re-identification. Our framework begins by extracting hierarchical features from both RGB and IR images through a shared convolutional neural network (CNN) backbone, ensuring the preservation of modality-specific characteristics. These features are then processed by parallel Nystr & ouml;mformer encoders, which efficiently capture long-range dependencies in linear time using lightweight self-attention mechanisms. To bridge the modality gap, a cross-attention fusion block is introduced, where RGB and IR features interact and integrate second-order covariance statistics to model higher-order correlations. The fused features are subsequently refined through projection layers and optimized for re-identification using a classification head. Extensive experiments on benchmark VIS-IR person Re-ID datasets demonstrate that NiCTRAM outperforms existing methods, achieving state-of-the-art accuracy and robustness by effectively addressing the cross-modality challenges inherent in VIS-IR Re-ID. The proposed NiCTRAM model achieves significant improvements over the current SOTA in VIS-IR ReID. On the SYSU-MM01 dataset, it surpasses the SOTA by 4.21% in Rank-1 accuracy and 2.79% in mAP for all-search single-shot mode, with similar gains in multi-shot settings. Additionally, NiCTRAM outperforms existing methods on RegDB and LLCM, achieving up to 5.90% higher Rank-1 accuracy and 5.83% higher mAP in Thermal-to-Visible mode. We will make the code and the model available at https://github.com/Ranjitkm2007/NiCTRAM

引用

页数：19

共 46 条

[1] Dual-Stream Transformer With Distribution Alignment for Visible-Infrared Person Re-Identification [J].

Chai, Zehua ;

Ling, Yongguo ;

Luo, Zhiming ;

Lin, Dazhen ;

Jiang, Min ;

Li, Shaozi .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) :6764-6776

[2] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].

Chen, Chun-Fu ;

Fan, Quanfu ;

Panda, Rameswar .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356

[3] Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification [J].

Choi, Seokeon ;

Lee, Sumin ;

Kim, Youngeun ;

Kim, Taekyung ;

Kim, Changick .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10254-10263

[4]

Choromanski Krzysztof, 2020, 9 INT C LEARNING REP

[5]

Dai PY, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P677

[6]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[7] Learning multi-granularity representation with transformer for visible-infrared person re-identification [J].

Feng, Yujian ;

Chen, Feng ;

Sun, Guozi ;

Wu, Fei ;

Ji, Yimu ;

Liu, Tianliang ;

Liu, Shangdong ;

Jing, Xiao-Yuan ;

Luo, Jiebo .

PATTERN RECOGNITION, 2025, 164

[8] Homogeneous and heterogeneous relational graph for visible-infrared person re-identification [J].

Feng, Yujian ;

Chen, Feng ;

Yu, Jian ;

Ji, Yimu ;

Wu, Fei ;

Liu, Shangdon ;

Jing, Xiao-Yuan .

PATTERN RECOGNITION, 2025, 158

[9] Visible-Infrared Person Re-Identification via Cross-Modality Interaction Transformer [J].

Feng, Yujian ;

Yu, Jian ;

Chen, Feng ;

Ji, Yimu ;

Wu, Fei ;

Liu, Shangdon ;

Jing, Xiao-Yuan .

IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 :7647-7659

[10] Occluded Visible-Infrared Person Re-Identification [J].

Feng, Yujian ;

Ji, Yimu ;

Wu, Fei ;

Gao, Guangwei ;

Gao, Yang ;

Liu, Tianliang ;

Liu, Shangdong ;

Jing, Xiao-Yuan ;

Luo, Jiebo .

IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 :1401-1413

← 1 2 3 4 5 →