Graph-based Consistent Reconstruction and Alignment for imbalanced text-image person re-identification

被引：0

作者：

Du, Guodong ^{[1
]}

Gong, Tiantian ^{[1
]}

Zhang, Liyan ^{[1
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2025年 / 260卷

基金：

中国国家自然科学基金;

关键词：

Person re-identification; Image-text retrieval; Cross-modal alignment; Modality imbalance; Robustness;

D O I：

10.1016/j.eswa.2024.125429

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-image person re-identification (TIReID) has emerged as a versatile approach for retrieving target pedestrians using textual descriptions. However, current TIReID research has been overly idealistic and has overlooked the issues of data incompleteness and modal imbalance in real-world application scenarios. Therefore, in this paper, we propose imbalanced text-image person re-identification (ITIReID) to address these problems. In comparison to TIReID, ITIReID contains a larger proportion of unimodal data, which leads to modal imbalance. The setting of ITIReID is more aligned with real-world scenarios, and studying ITIReID can expand the application scalability of TIReID. We propose a Graph-based Consistent Reconstruction and Alignment framework (GCRA), for ITIReID, which achieves modal balance by completing missing modality features for training implementation. By treating the accessible modality features as graph nodes, GCRA firstly builds an adjacency graph where a new semantic distance that establishes semantic relevance between nodes by comprehensively measuring both intra-modality and inter-modality correlation, serves as the measurement of graph's edges. GCRA further reconstructs the missing nodes - thus re-establishing missing modality features - using existing nodes connected with high semantic relevance. To ensure the reliability and effectiveness of reconstructed features, we propose a proxy-based identity constraint and a reconstruction constraint. In addition, to enable effective semantic alignment using both the reconstructed features and original features, we introduce a cross-modal semantic constraint. Extensive experiments demonstrate that GCRA can effectively handle issues of data incompleteness and modal imbalance, exhibiting its effectiveness and superiority.

引用

页数：14

共 50 条

[41] BDNet: A BERT-based dual-path network for text-to-image cross-modal person re-identification
Liu, Qiang
He, Xiaohai
Teng, Qizhi
Qing, Linbo
Chen, Honggang
PATTERN RECOGNITION, 2023, 141
[42] Pose-guided spatiotemporal alignment for video-based person Re-identification
Gao, Changxin
Chen, Yang
Yu, Jin-Gang
Sang, Nong
INFORMATION SCIENCES, 2020, 527 : 176 - 190
[43] AAGNet: Attribute-Aware Graph-Based Network for Occluded Pedestrian Re-Identification
Yao, Shihong
Pan, Keyu
Wang, Tao
Zheng, Zhigao
Jin, Jing
Hu, Chuli
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (04) : 6580 - 6588
[44] Graph-Based Progressive Fusion Network for Multi-Modality Vehicle Re-Identification
He, Qiaolin
Lu, Zefeng
Wang, Zihan
Hu, Haifeng
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (11) : 12431 - 12447
[45] Person Re-identification Based on Body Segmentation
Jiang, Hua
Zhang, Liang
COMPUTER VISION, PT II, 2017, 772 : 474 - 485
[46] Combined salience based person re-identification
Choe, Gwangmin
Yuan, Caihong
Wang, Tianjiang
Feng, Qi
Hyon, Gyongil
Choe, Chunhwa
Ri, Jonghwan
Ji, Gumhyok
MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (18) : 11447 - 11468
[47] Combined salience based person re-identification
Gwangmin Choe
Caihong Yuan
Tianjiang Wang
Qi Feng
Gyongil Hyon
Chunhwa Choe
Jonghwan Ri
Gumhyok Ji
Multimedia Tools and Applications, 2016, 75 : 11447 - 11468
[48] Person Re-identification Based on Visual Saliency
Liu, Ying
Shao, Yu
Sun, Fuchun
2012 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA), 2012, : 884 - 889
[49] Deep pixel regeneration for occlusion reconstruction in person re-identification
Nirbhay Kumar Tagore
Prathistith Raj Medi
Pratik Chattopadhyay
Multimedia Tools and Applications, 2024, 83 : 4443 - 4463
[50] Resource-efficient Text-based Person Re-identification on Embedded Devices
Agyeman, Rockson
Rinner, Bernhard
2024 20TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SMART SYSTEMS AND THE INTERNET OF THINGS, DCOSS-IOT 2024, 2024, : 84 - 92

← 1 2 3 4 5 →