Exploring attribute localization and correlation for pedestrian attribute recognition

被引:14
作者
Weng, Dunfang [1 ]
Tan, Zichang [2 ,3 ]
Fang, Liwei [1 ]
Guo, Guodong [2 ,3 ]
机构
[1] AI PRIME, Dept Algorithm, Shanghai, Peoples R China
[2] Inst Deep Learning, Baidu Res, Beijing, Peoples R China
[3] Natl Engn Lab Deep Learning Technol & Applicat, Beijing, Peoples R China
关键词
Pedestrian attribute recognition; Transformer; Attention; Deep learning; PERSON REIDENTIFICATION; FEATURES; NETWORK;
D O I
10.1016/j.neucom.2023.02.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pedestrian Attribute Recognition (PAR) is currently an emerging research topic in the field of video surveillance. For PAR, it usually needs to analyze dozens of attributes simultaneously, e.g., age, gender and Clothing type. However, different attributes may focus on different image regions, which makes it difficult to concurrently extract exhaustive features over all attributes. Moreover, some of these attri-butes are highly correlated, which is the other challenge for pedestrian attribute recognition. To remedy the aforementioned two issues, we propose two novel modules, namely Attribute Localization Module (ALM) and Attribute Correlation Module (ACM). For ALM, it is constructed based on a multi-stream archi-tecture with each stream processing a specific attribute individually. More specifically, an attention mechanism is employed to discover and enhance the attribute-related features while suppressing less important regions. For ACM, the Transformer structure is employed to effectively explore the correlations among different attributes. In particular, we place the Transformer blocks behind the ALM module, with regarding each attribute-specific feature as an input token. The ALM and ACM modules focus on different aspects, which exploits the interrelated and complementary information. We combine the proposed modules to form a unified network with Exploring Attribute Localization and Correlation (abbreviated as EALC). Our approach is validated on five large-scale pedestrian attribute datasets, including PETA, RAP, PA-100 K, Market-1501 and Duke attribute datasets. Experiments demonstrate the effectiveness and advancement of the proposed EALC.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:140 / 150
页数:11
相关论文
共 66 条
[1]   Cascading Scene and Viewpoint Feature Learning for Pedestrian Gender Recognition [J].
Cai, Lei ;
Zeng, Huanqiang ;
Zhu, Jianqing ;
Cao, Jiuwen ;
Wang, Yongtao ;
Ma, Kai-Kuang .
IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (04) :3014-3026
[2]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[3]  
Chen H., 2020, arXiv, DOI DOI 10.48550/ARXIV.2012.00364
[4]   Age estimation via attribute-region association [J].
Chen, Yiliang ;
He, Shengfeng ;
Tan, Zichang ;
Han, Chu ;
Han, Guoqiang ;
Qin, Jing .
NEUROCOMPUTING, 2019, 367 :346-356
[5]   Graph-Based Global Reasoning Networks [J].
Chen, Yunpeng ;
Rohrbach, Marcus ;
Yan, Zhicheng ;
Yan, Shuicheng ;
Feng, Jiashi ;
Kalantidis, Yannis .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :433-442
[6]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7]   Pedestrian Attribute Recognition At Far Distance [J].
Deng, Yubin ;
Luo, Ping ;
Loy, Chen Change ;
Tang, Xiaoou .
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :789-792
[8]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[9]  
Fan H., IEEE T MULTIMEDIA
[10]   J-LDFR: joint low-level and deep neural network feature representations for pedestrian gender classification [J].
Fayyaz, Muhammad ;
Yasmin, Mussarat ;
Sharif, Muhammad ;
Raza, Mudassar .
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (01) :361-391