ASPD-Net: Self-aligned part mask for improving text-based person re-identification with adversarial representation learning

被引:5
作者
Wang, Zijie [1 ]
Xue, Jingyi [1 ]
Wan, Xili [1 ]
Zhu, Aichun [1 ,2 ]
Li, Yifeng [1 ]
Zhu, Xiaomei [1 ]
Hu, Fangqiang [1 ]
机构
[1] Nanjing Tech Univ, Sch Comp Sci & Technol, Nanjing, Peoples R China
[2] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Part mask detection; Text-based person re-identification; Adversarial learning; NETWORK;
D O I
10.1016/j.engappai.2022.105419
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text-based person re-identification aims to retrieve images of the corresponding person from a large visual database according to a natural language description. When it comes to visual local information extraction, most of the state-of-the-art methods adopt either a strict uniform strategy which can be too rough to catch local details properly, or pre-processing with external cues which may suffer from the deviations of the pre-trained model and the large computation consumption. In this paper, we proposed an Adversarial Self -aligned Part Detecting Network (ASPD-Net) model which extracts and combines multi-granular visual and textual features. A novel Self-aligned Part Mask Module was presented to autonomously learn the information of human body parts, and obtain visual local features in a soft-attention manner by using K Self-aligned Part Mask Detectors. Regarding the main model branches as a generator, a discriminator is employed to determine whether the representation vector comes from the visual modality or the textual modality. With Adversarial Loss training, ASPD-Net can learn more robust representations, as long as it successfully tricks the discriminator. Experimental results demonstrate that the proposed ASPD-Net outperforms the previous methods and achieves the state-of-the-art performance on the CUHK-PEDES and RSTPReid datasets.
引用
收藏
页数:12
相关论文
共 54 条
[51]   Invariance Matters: Exemplar Memory for Domain Adaptive Person Re-identification [J].
Zhong, Zhun ;
Zheng, Liang ;
Luo, Zhiming ;
Li, Shaozi ;
Yang, Yi .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :598-607
[52]   DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval [J].
Zhu, Aichun ;
Wang, Zijie ;
Li, Yifeng ;
Wan, Xili ;
Jin, Jing ;
Wang, Tian ;
Hu, Fangqiang ;
Hua, Gang .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :209-217
[53]   CACrowdGAN: Cascaded Attentional Generative Adversarial Network for Crowd Counting [J].
Zhu, Aichun ;
Zheng, Zhe ;
Huang, Yaoying ;
Wang, Tian ;
Jin, Jing ;
Hu, Fangqiang ;
Hua, Gang ;
Snoussi, Hichem .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (07) :8090-8102
[54]   Exploring a rich spatial-temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN [J].
Zhu, Aichun ;
Wu, Qianyu ;
Cui, Ran ;
Wang, Tian ;
Hang, Wenlong ;
Hua, Gang ;
Snoussi, Hichem .
NEUROCOMPUTING, 2020, 414 :90-100