ASPD-Net: Self-aligned part mask for improving text-based person re-identification with adversarial representation learning

被引：5

作者：

Wang, Zijie ^{[1
]}

Xue, Jingyi ^{[1
]}

Wan, Xili ^{[1
]}

Zhu, Aichun ^{[1
,2
]}

Li, Yifeng ^{[1
]}

Zhu, Xiaomei ^{[1
]}

Hu, Fangqiang ^{[1
]}

机构：

[1] Nanjing Tech Univ, Sch Comp Sci & Technol, Nanjing, Peoples R China

[2] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou, Peoples R China

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2022年 / 116卷

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Part mask detection; Text-based person re-identification; Adversarial learning; NETWORK;

D O I：

10.1016/j.engappai.2022.105419

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text-based person re-identification aims to retrieve images of the corresponding person from a large visual database according to a natural language description. When it comes to visual local information extraction, most of the state-of-the-art methods adopt either a strict uniform strategy which can be too rough to catch local details properly, or pre-processing with external cues which may suffer from the deviations of the pre-trained model and the large computation consumption. In this paper, we proposed an Adversarial Self -aligned Part Detecting Network (ASPD-Net) model which extracts and combines multi-granular visual and textual features. A novel Self-aligned Part Mask Module was presented to autonomously learn the information of human body parts, and obtain visual local features in a soft-attention manner by using K Self-aligned Part Mask Detectors. Regarding the main model branches as a generator, a discriminator is employed to determine whether the representation vector comes from the visual modality or the textual modality. With Adversarial Loss training, ASPD-Net can learn more robust representations, as long as it successfully tricks the discriminator. Experimental results demonstrate that the proposed ASPD-Net outperforms the previous methods and achieves the state-of-the-art performance on the CUHK-PEDES and RSTPReid datasets.

引用

页数：12

共 54 条

[51] Invariance Matters: Exemplar Memory for Domain Adaptive Person Re-identification [J].

Zhong, Zhun ;

Zheng, Liang ;

Luo, Zhiming ;

Li, Shaozi ;

Yang, Yi .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :598-607

[52] DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval [J].

Zhu, Aichun ;

Wang, Zijie ;

Li, Yifeng ;

Wan, Xili ;

Jin, Jing ;

Wang, Tian ;

Hu, Fangqiang ;

Hua, Gang .

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :209-217

[53] CACrowdGAN: Cascaded Attentional Generative Adversarial Network for Crowd Counting [J].

Zhu, Aichun ;

Zheng, Zhe ;

Huang, Yaoying ;

Wang, Tian ;

Jin, Jing ;

Hu, Fangqiang ;

Hua, Gang ;

Snoussi, Hichem .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (07) :8090-8102

[54] Exploring a rich spatial-temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN [J].

Zhu, Aichun ;

Wu, Qianyu ;

Cui, Ran ;

Wang, Tian ;

Hang, Wenlong ;

Hua, Gang ;

Snoussi, Hichem .

NEUROCOMPUTING, 2020, 414 :90-100

← 1 2 3 4 5 6 →