Graph-based Consistent Reconstruction and Alignment for imbalanced text-image person re-identification

被引:0
|
作者
Du, Guodong [1 ]
Gong, Tiantian [1 ]
Zhang, Liyan [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
Person re-identification; Image-text retrieval; Cross-modal alignment; Modality imbalance; Robustness;
D O I
10.1016/j.eswa.2024.125429
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-image person re-identification (TIReID) has emerged as a versatile approach for retrieving target pedestrians using textual descriptions. However, current TIReID research has been overly idealistic and has overlooked the issues of data incompleteness and modal imbalance in real-world application scenarios. Therefore, in this paper, we propose imbalanced text-image person re-identification (ITIReID) to address these problems. In comparison to TIReID, ITIReID contains a larger proportion of unimodal data, which leads to modal imbalance. The setting of ITIReID is more aligned with real-world scenarios, and studying ITIReID can expand the application scalability of TIReID. We propose a Graph-based Consistent Reconstruction and Alignment framework (GCRA), for ITIReID, which achieves modal balance by completing missing modality features for training implementation. By treating the accessible modality features as graph nodes, GCRA firstly builds an adjacency graph where a new semantic distance that establishes semantic relevance between nodes by comprehensively measuring both intra-modality and inter-modality correlation, serves as the measurement of graph's edges. GCRA further reconstructs the missing nodes - thus re-establishing missing modality features - using existing nodes connected with high semantic relevance. To ensure the reliability and effectiveness of reconstructed features, we propose a proxy-based identity constraint and a reconstruction constraint. In addition, to enable effective semantic alignment using both the reconstructed features and original features, we introduce a cross-modal semantic constraint. Extensive experiments demonstrate that GCRA can effectively handle issues of data incompleteness and modal imbalance, exhibiting its effectiveness and superiority.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] BDNet: A BERT-based dual-path network for text-to-image cross-modal person re-identification
    Liu, Qiang
    He, Xiaohai
    Teng, Qizhi
    Qing, Linbo
    Chen, Honggang
    PATTERN RECOGNITION, 2023, 141
  • [42] Pose-guided spatiotemporal alignment for video-based person Re-identification
    Gao, Changxin
    Chen, Yang
    Yu, Jin-Gang
    Sang, Nong
    INFORMATION SCIENCES, 2020, 527 : 176 - 190
  • [43] AAGNet: Attribute-Aware Graph-Based Network for Occluded Pedestrian Re-Identification
    Yao, Shihong
    Pan, Keyu
    Wang, Tao
    Zheng, Zhigao
    Jin, Jing
    Hu, Chuli
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (04) : 6580 - 6588
  • [44] Graph-Based Progressive Fusion Network for Multi-Modality Vehicle Re-Identification
    He, Qiaolin
    Lu, Zefeng
    Wang, Zihan
    Hu, Haifeng
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (11) : 12431 - 12447
  • [45] Person Re-identification Based on Body Segmentation
    Jiang, Hua
    Zhang, Liang
    COMPUTER VISION, PT II, 2017, 772 : 474 - 485
  • [46] Combined salience based person re-identification
    Choe, Gwangmin
    Yuan, Caihong
    Wang, Tianjiang
    Feng, Qi
    Hyon, Gyongil
    Choe, Chunhwa
    Ri, Jonghwan
    Ji, Gumhyok
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (18) : 11447 - 11468
  • [47] Combined salience based person re-identification
    Gwangmin Choe
    Caihong Yuan
    Tianjiang Wang
    Qi Feng
    Gyongil Hyon
    Chunhwa Choe
    Jonghwan Ri
    Gumhyok Ji
    Multimedia Tools and Applications, 2016, 75 : 11447 - 11468
  • [48] Person Re-identification Based on Visual Saliency
    Liu, Ying
    Shao, Yu
    Sun, Fuchun
    2012 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA), 2012, : 884 - 889
  • [49] Deep pixel regeneration for occlusion reconstruction in person re-identification
    Nirbhay Kumar Tagore
    Prathistith Raj Medi
    Pratik Chattopadhyay
    Multimedia Tools and Applications, 2024, 83 : 4443 - 4463
  • [50] Resource-efficient Text-based Person Re-identification on Embedded Devices
    Agyeman, Rockson
    Rinner, Bernhard
    2024 20TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SMART SYSTEMS AND THE INTERNET OF THINGS, DCOSS-IOT 2024, 2024, : 84 - 92