ASPD-Net: Self-aligned part mask for improving text-based person re-identification with adversarial representation learning

被引:5
作者
Wang, Zijie [1 ]
Xue, Jingyi [1 ]
Wan, Xili [1 ]
Zhu, Aichun [1 ,2 ]
Li, Yifeng [1 ]
Zhu, Xiaomei [1 ]
Hu, Fangqiang [1 ]
机构
[1] Nanjing Tech Univ, Sch Comp Sci & Technol, Nanjing, Peoples R China
[2] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Part mask detection; Text-based person re-identification; Adversarial learning; NETWORK;
D O I
10.1016/j.engappai.2022.105419
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text-based person re-identification aims to retrieve images of the corresponding person from a large visual database according to a natural language description. When it comes to visual local information extraction, most of the state-of-the-art methods adopt either a strict uniform strategy which can be too rough to catch local details properly, or pre-processing with external cues which may suffer from the deviations of the pre-trained model and the large computation consumption. In this paper, we proposed an Adversarial Self -aligned Part Detecting Network (ASPD-Net) model which extracts and combines multi-granular visual and textual features. A novel Self-aligned Part Mask Module was presented to autonomously learn the information of human body parts, and obtain visual local features in a soft-attention manner by using K Self-aligned Part Mask Detectors. Regarding the main model branches as a generator, a discriminator is employed to determine whether the representation vector comes from the visual modality or the textual modality. With Adversarial Loss training, ASPD-Net can learn more robust representations, as long as it successfully tricks the discriminator. Experimental results demonstrate that the proposed ASPD-Net outperforms the previous methods and achieves the state-of-the-art performance on the CUHK-PEDES and RSTPReid datasets.
引用
收藏
页数:12
相关论文
共 54 条
[1]  
Aggarwal S, 2020, IEEE WINT CONF APPL, P2606, DOI [10.1109/WACV45572.2020.9093640, 10.1109/wacv45572.2020.9093640]
[2]  
[Anonymous], 2018, P EUR C COMP VIS ECC
[3]   Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association [J].
Chen, Dapeng ;
Li, Hongsheng ;
Liu, Xihui ;
Shen, Yantao ;
Shao, Jing ;
Yuan, Zejian ;
Wang, Xiaogang .
COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :56-73
[4]   A negative transfer approach to person re-identification via domain augmentation [J].
Chen, Feng ;
Wang, Nian ;
Tang, Jun ;
Liang, Dong .
INFORMATION SCIENCES, 2021, 549 :1-12
[5]   Improving Text-based Person Search by Spatial Matching and Adaptive Threshold [J].
Chen, Tianlang ;
Xu, Chenliang ;
Luo, Jiebo .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :1879-1887
[6]  
Chen Y., NEUROCOMPUTING
[7]   Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function [J].
Cheng, De ;
Gong, Yihong ;
Zhou, Sanping ;
Wang, Jinjun ;
Zheng, Nanning .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1335-1344
[8]   Multi-scale generative adversarial network for image super-resolution [J].
Daihong, Jiang ;
Sai, Zhang ;
Lei, Dai ;
Yueming, Dai .
SOFT COMPUTING, 2022, 26 (08) :3631-3641
[9]  
Ding ZF, 2021, Arxiv, DOI arXiv:2107.12666
[10]  
Faghri F., 2018, P BRIT MACHINE VISIO