Nystromformer based cross-modality transformer for visible-infrared person re-identification

被引:1
作者
Mishra, Ranjit Kumar [1 ]
Mondal, Arijit [1 ]
Mathew, Jimson [1 ]
机构
[1] Indian Inst Technol Patna, Dept Comp Sci & Engn, Patna 801106, Bihar, India
关键词
Nystr & ouml; mformer; Higher Order Attention; Person Re-identification; Cross-Modality Fusion;
D O I
10.1038/s41598-025-01226-5
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Person re-identification (Re-ID) aims to accurately match individuals across different camera views, a critical task for surveillance and security applications, often under varying conditions such as illumination, pose, and background. Traditional Re-ID systems operate solely in the visible spectrum, which limits their effectiveness under varying lighting conditions and at night. To overcome these limitations, leveraging the visible-infrared (VIS-IR) domain becomes essential, as infrared imaging provides reliable information in low-light and night-time environments. However, the integration of VIS (visible) and IR (infrared) modalities introduces significant cross-modality discrepancies, posing a major challenge for feature alignment and fusion. To address this, we propose NiCTRAM: a Nystr & ouml;mformer-based Cross-Modality Transformer designed for robust VIS-IR person re-identification. Our framework begins by extracting hierarchical features from both RGB and IR images through a shared convolutional neural network (CNN) backbone, ensuring the preservation of modality-specific characteristics. These features are then processed by parallel Nystr & ouml;mformer encoders, which efficiently capture long-range dependencies in linear time using lightweight self-attention mechanisms. To bridge the modality gap, a cross-attention fusion block is introduced, where RGB and IR features interact and integrate second-order covariance statistics to model higher-order correlations. The fused features are subsequently refined through projection layers and optimized for re-identification using a classification head. Extensive experiments on benchmark VIS-IR person Re-ID datasets demonstrate that NiCTRAM outperforms existing methods, achieving state-of-the-art accuracy and robustness by effectively addressing the cross-modality challenges inherent in VIS-IR Re-ID. The proposed NiCTRAM model achieves significant improvements over the current SOTA in VIS-IR ReID. On the SYSU-MM01 dataset, it surpasses the SOTA by 4.21% in Rank-1 accuracy and 2.79% in mAP for all-search single-shot mode, with similar gains in multi-shot settings. Additionally, NiCTRAM outperforms existing methods on RegDB and LLCM, achieving up to 5.90% higher Rank-1 accuracy and 5.83% higher mAP in Thermal-to-Visible mode. We will make the code and the model available at https://github.com/Ranjitkm2007/NiCTRAM
引用
收藏
页数:19
相关论文
共 46 条
[1]   Dual-Stream Transformer With Distribution Alignment for Visible-Infrared Person Re-Identification [J].
Chai, Zehua ;
Ling, Yongguo ;
Luo, Zhiming ;
Lin, Dazhen ;
Jiang, Min ;
Li, Shaozi .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) :6764-6776
[2]   CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].
Chen, Chun-Fu ;
Fan, Quanfu ;
Panda, Rameswar .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356
[3]   Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification [J].
Choi, Seokeon ;
Lee, Sumin ;
Kim, Youngeun ;
Kim, Taekyung ;
Kim, Changick .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10254-10263
[4]  
Choromanski Krzysztof, 2020, 9 INT C LEARNING REP
[5]  
Dai PY, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P677
[6]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[7]   Learning multi-granularity representation with transformer for visible-infrared person re-identification [J].
Feng, Yujian ;
Chen, Feng ;
Sun, Guozi ;
Wu, Fei ;
Ji, Yimu ;
Liu, Tianliang ;
Liu, Shangdong ;
Jing, Xiao-Yuan ;
Luo, Jiebo .
PATTERN RECOGNITION, 2025, 164
[8]   Homogeneous and heterogeneous relational graph for visible-infrared person re-identification [J].
Feng, Yujian ;
Chen, Feng ;
Yu, Jian ;
Ji, Yimu ;
Wu, Fei ;
Liu, Shangdon ;
Jing, Xiao-Yuan .
PATTERN RECOGNITION, 2025, 158
[9]   Visible-Infrared Person Re-Identification via Cross-Modality Interaction Transformer [J].
Feng, Yujian ;
Yu, Jian ;
Chen, Feng ;
Ji, Yimu ;
Wu, Fei ;
Liu, Shangdon ;
Jing, Xiao-Yuan .
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 :7647-7659
[10]   Occluded Visible-Infrared Person Re-Identification [J].
Feng, Yujian ;
Ji, Yimu ;
Wu, Fei ;
Gao, Guangwei ;
Gao, Yang ;
Liu, Tianliang ;
Liu, Shangdong ;
Jing, Xiao-Yuan ;
Luo, Jiebo .
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 :1401-1413