ASPD-Net: Self-aligned part mask for improving text-based person re-identification with adversarial representation learning

被引：5

作者：

Wang, Zijie ^{[1
]}

Xue, Jingyi ^{[1
]}

Wan, Xili ^{[1
]}

Zhu, Aichun ^{[1
,2
]}

Li, Yifeng ^{[1
]}

Zhu, Xiaomei ^{[1
]}

Hu, Fangqiang ^{[1
]}

机构：

[1] Nanjing Tech Univ, Sch Comp Sci & Technol, Nanjing, Peoples R China

[2] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou, Peoples R China

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2022年 / 116卷

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Part mask detection; Text-based person re-identification; Adversarial learning; NETWORK;

D O I：

10.1016/j.engappai.2022.105419

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text-based person re-identification aims to retrieve images of the corresponding person from a large visual database according to a natural language description. When it comes to visual local information extraction, most of the state-of-the-art methods adopt either a strict uniform strategy which can be too rough to catch local details properly, or pre-processing with external cues which may suffer from the deviations of the pre-trained model and the large computation consumption. In this paper, we proposed an Adversarial Self -aligned Part Detecting Network (ASPD-Net) model which extracts and combines multi-granular visual and textual features. A novel Self-aligned Part Mask Module was presented to autonomously learn the information of human body parts, and obtain visual local features in a soft-attention manner by using K Self-aligned Part Mask Detectors. Regarding the main model branches as a generator, a discriminator is employed to determine whether the representation vector comes from the visual modality or the textual modality. With Adversarial Loss training, ASPD-Net can learn more robust representations, as long as it successfully tricks the discriminator. Experimental results demonstrate that the proposed ASPD-Net outperforms the previous methods and achieves the state-of-the-art performance on the CUHK-PEDES and RSTPReid datasets.

引用

页数：12

共 54 条

[1]

Aggarwal S, 2020, IEEE WINT CONF APPL, P2606, DOI [10.1109/WACV45572.2020.9093640, 10.1109/wacv45572.2020.9093640]

[2]

[Anonymous], 2018, P EUR C COMP VIS ECC

[3] Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association [J].

Chen, Dapeng ;

Li, Hongsheng ;

Liu, Xihui ;

Shen, Yantao ;

Shao, Jing ;

Yuan, Zejian ;

Wang, Xiaogang .

COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :56-73

[4] A negative transfer approach to person re-identification via domain augmentation [J].

Chen, Feng ;

Wang, Nian ;

Tang, Jun ;

Liang, Dong .

INFORMATION SCIENCES, 2021, 549 :1-12

[5] Improving Text-based Person Search by Spatial Matching and Adaptive Threshold [J].

Chen, Tianlang ;

Xu, Chenliang ;

Luo, Jiebo .

2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :1879-1887

[6]

Chen Y., NEUROCOMPUTING

[7] Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function [J].

Cheng, De ;

Gong, Yihong ;

Zhou, Sanping ;

Wang, Jinjun ;

Zheng, Nanning .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1335-1344

[8] Multi-scale generative adversarial network for image super-resolution [J].

Daihong, Jiang ;

Sai, Zhang ;

Lei, Dai ;

Yueming, Dai .

SOFT COMPUTING, 2022, 26 (08) :3631-3641

[9]

Ding ZF, 2021, Arxiv, DOI arXiv:2107.12666

[10]

Faghri F., 2018, P BRIT MACHINE VISIO

← 1 2 3 4 5 6 →