POAR: Towards Open Vocabulary Pedestrian Attribute Recognition

被引：1

作者：

Zhang, Yue ^{[1
,2
]}

Wang, Suchen ^{[3
]}

Kan, Shichao ^{[4
]}

Weng, Zhenyu ^{[3
]}

Cen, Yigang ^{[5
,6
]}

Tan, Yap-peng ^{[3
]}

机构：

[1] Henan Normal Univ, Coll Comp & Informat Engn, Xinxiang, Henan, Peoples R China

[2] Henan Normal Univ, Key Lab Artificial Intelligence & Personalized Le, Xinxiang, Peoples R China

[3] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore

[4] Cent South Univ, Sch Comp Sci & Engn, Changsha, Peoples R China

[5] Beijing Jiaotong Univ, Inst Informat Sci, Beijing, Peoples R China

[6] Beijing Jiaotong Univ, Beijing Key Lab Adv Informat Sci & Network Techno, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Pedestrian attribute recognition; CLIP; Open-attribute recognition; NETWORK;

D O I：

10.1145/3581783.3611719

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pedestrian attribute recognition (PAR) aims to predict the attributes of a target pedestrian. Recent methods often address the PAR problem by training a multi-label classifier with predefined attribute classes, but they can hardly exhaust all possible pedestrian attributes in the real world. To tackle this problem, we propose a novel Pedestrian Open-Attribute Recognition (POAR) approach by formulating the problem as a task of image-text search. Our approach employs a Transformer-based Encoder with a Masking Strategy (TEMS) to focus on the attributes of specific pedestrian parts (e.g., head, upper body, lower body, feet, etc.), and introduces a set of attribute tokens to encode the corresponding attributes into visual embeddings. Each attribute category is described as a natural language sentence and encoded by the text encoder. Then, we compute the similarity between the visual and text embeddings to find the best attribute descriptions for the input images. To handle multiple attributes of a single pedestrian, we propose a Many-To-Many Contrastive (MTMC) loss with masked tokens. In addition, we propose a Grouped Knowledge Distillation (GKD) method to minimize the disparity between visual embeddings and unseen attribute text embeddings. We evaluate our proposed method on three PAR datasets with an open-attribute setting. The results demonstrate the effectiveness of our method as a strong baseline for the POAR task. Our code is available at https://github.com/IvyYZ/POAR.

引用

页码：655 / 665

页数：11

共 48 条

[1] Partially Shared Multi-Task Convolutional Neural Network with Local Constraint for Face Attribute Learning
Cao, Jiajiong
Li, Yingming
Zhang, Zhongfei
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4290 - 4299
[2] A Comprehensive Survey of Scene Graphs: Generation and Application
Chang, Xiaojun
Ren, Pengzhen
Xu, Pengfei
Li, Zhihui
Chen, Xiaojiang
Hauptmann, Alex
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 1 - 26
[3] Cheng Xijie, 2022, IEEE Transactions on Circuits and Systems for Video Technology
[4] ENHANCING CLASS UNDERSTANDING VIA PROMPT-TUNING FOR ZERO-SHOT TEXT CLASSIFICATION
Dan, Yuhao
Zhou, Jie
Chen, Qin
Bai, Qingchun
He, Liang
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4303 - 4307
[5] Pedestrian Attribute Recognition At Far Distance
Deng, Yubin
Luo, Ping
Loy, Chen Change
Tang, Xiaoou
[J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 789 - 792
[6] Ding ZF, 2021, Arxiv, DOI arXiv:2107.12666
[7] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[8] Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model
Du, Yu
Wei, Fangyun
Zhang, Zihe
Shi, Miaojing
Gao, Yue
Li, Guoqi
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14064 - 14073
[9] Esmaeilpour S, 2022, AAAI CONF ARTIF INTE, P6568
[10] Correlation Graph Convolutional Network for Pedestrian Attribute Recognition
Fan, Haonan
Hu, Hai-Miao
Liu, Shuailing
Lu, Weiqing
Pu, Shiliang
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 49 - 60

← 1 2 3 4 5 →