CM-DASN: visible-infrared cross-modality person re-identification via dynamic attention selection network

被引：0

作者：

Li, Yuxin ^{[1
]}

Lu, Hu ^{[1
]}

Qin, Tingting ^{[1
]}

Tu, Juanjuan ^{[2
]}

Wu, Shengli ^{[3
]}

机构：

[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China

[2] Jiangsu Univ Sci & Technol, Sch Comp, Zhenjiang 212100, Jiangsu, Peoples R China

[3] Ulster Univ, Sch Comp, Belfast BT15 1ED, North Ireland

来源：

MULTIMEDIA SYSTEMS | 2025年 / 31卷 / 02期

关键词：

Person re-identification; Visible-infrared; Cross-modality; Vision transformer;

D O I：

10.1007/s00530-025-01724-6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cross-modality person re-identification between RGB and IR images presents significant challenges due to substantial modality discrepancies. While existing approaches often focus on learning either modality-specific or modality-shared features, overemphasis on the former may hinder cross-modality matching, whereas the latter are more beneficial for this task. To address this challenge, we propose CM-DASN (Cross-Modality Dynamic Attention Selection Network), a novel approach based on dynamic attention optimization. The core of our method is the Dynamic Attention Selection Module (DASM), which adaptively selects the most effective combination of attention heads in the later stages of training, thereby balancing the learning of modality-shared and modality-specific features. We employ a softmax score-based feature selection mechanism to extract and enhance the most discriminative cross-modality feature representations. By alternating supervised learning of high-scoring modality-shared and modality-specific features in the later training stages, the model focuses on learning highly discriminative modality-shared features while retaining beneficial modality-specific information. Furthermore, we design a multi-stage, multi-scale cross-modality feature alignment strategy to more effectively learn cross-modality representations by aligning features of different scales in a phased, progressive manner. This approach considers both global structure and local details, thereby improving cross-modality person re-identification performance. Our method achieves higher cross-modality matching accuracy with minimal increases in model parameters and computational time. Extensive experiments on the SYSU-MM01 and RegDB datasets validate the effectiveness of our proposed framework, demonstrating that it outperforms most existing state-of-the-art approaches in terms of performance. The source code is available at https://github.com/hulu88/CM_DASN.

引用

页数：14

共 49 条

[1] Dual-Stream Transformer With Distribution Alignment for Visible-Infrared Person Re-Identification
Chai, Zehua
Ling, Yongguo
Luo, Zhiming
Lin, Dazhen
Jiang, Min
Li, Shaozi
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 6764 - 6776
[2] Structure-Aware Positional Transformer for Visible-Infrared Person Re-Identification
Chen, Cuiqun
Ye, Mang
Qi, Meibin
Wu, Jingjing
Jiang, Jianguo
Lin, Chia-Wen
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 2352 - 2364
[3] Learning shared features from specific and ambiguous descriptions for text-based person search
Cheng, Ke
Geng, Qikai
Huang, Shucheng
Tu, Juanjuan
Lu, Hu
[J]. MULTIMEDIA SYSTEMS, 2024, 30 (02)
[4] Dai PY, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P677
[5] Dosovitskiy Alexey, 2020, COMPUTER VISION PATT
[6] Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
Feng, Jiawei
Wu, Ancong
Zhen, Wei-Shi
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22752 - 22761
[7] CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification
Fu, Chaoyou
Hu, Yibo
Wu, Xiang
Shi, Hailin
Mei, Tao
He, Ran
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11803 - 11812
[8] TransReID: Transformer-based Object Re-Identification
He, Shuting
Luo, Hao
Wang, Pichao
Wang, Fan
Li, Hao
Jiang, Wei
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14993 - 15002
[9] Hu J, 2018, P IEEE C COMP VIS PA, P7132
[10] MSCMNet: Multi-scale Semantic Correlation Mining for Visible-Infrared Person Re-Identification
Hua, Xuecheng
Cheng, Ke
Lu, Hu
Tu, Juanjuan
Wang, Yuanquan
Wang, Shitong
[J]. PATTERN RECOGNITION, 2025, 159

← 1 2 3 4 5 →