Improving Pedestrian Attribute Recognition with Multi-Scale Spatial Calibration

被引：7

作者：

Zhong, Jiabao ^{[1
,2
]}

Qiao, Hezhe ^{[2
]}

Chen, Lin ^{[2
]}

Shang, Mingsheng ^{[2
]}

Liu, Qun ^{[1
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing 400065, Peoples R China

[2] Chinese Acad Sci, Chongqing Inst Green & Intelligent Technol, Chongqing Key Lab Big Data & Intelligent Comp, Chongqing 400714, Peoples R China

来源：

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年

关键词：

Pedestrian attribute recognition; Fine-grained attribute recognition; Multi-scale feature fusion; Spatial calibrated Module; MODEL;

D O I：

10.1109/IJCNN52387.2021.9533647

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pedestrian Attribute Recognition (PAR) has attracted increasing attention since it could provide important structural information of pedestrians for Smart Video Analysis. However, the pedestrian images are taken from a far distance significantly increase the difficulty of PAR for fine-grained attributes. To address these problems, and further improve the effects of PAR, we proposed a Multi-Scale Spatial Calibration (MSSC) module. More specifically, the module includes two submodules: first, a Spatial Calibrated Module (SCM) is proposed to extract more discriminative features of inconspicuous attributes from its surrounding regions by gathering the contextual information across different receptive fields. Moreover, in order to build the long-range dependencies of pyramid feature maps in different spatial scales, we also propose Multi-Scale Feature Fusion (MSFF) to integrate the multiple branches of low-level detailed features and high-level semantics features by non-local attention mechanism. Extensive experiments show that our proposed model could achieve state-of-the-art results on three pedestrian attribute datasets, including RAPv1, PA-100K, and RAPv2. Especially, the proposed model significantly improves the recognition effects of fine-grained attributes in low-resolution images in terms of mean Accuracy (mA) and recall.

引用

页数：8

共 40 条

[1] Multi-Task CNN Model for Attribute Prediction [J].

Abdulnabi, Abrar H. ;

Wang, Gang ;

Lu, Jiwen ;

Jia, Kui .

IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) :1949-1959

[2] Hierarchical Reasoning Network for Pedestrian Attribute Recognition [J].

An, Haoran ;

Hu, Hai-Miao ;

Guo, Yuanfang ;

Zhou, Qianli ;

Li, Bo .

IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 :268-280

[3]

[Anonymous], 2015, Proc. IEEE International Conference on Computer Vision Workshops

[4]

[Anonymous], 2018, ARXIV 1808 09102

[5]

[Anonymous], 2019, ARXIV 1907 11837

[6]

[Anonymous], 2020, ARXIV 2005 11909

[7]

[Anonymous], 2020, PROCEEDINGS OF THE A

[8]

[Anonymous], 2016, ARXIV 1603 07054

[9]

Bourdev L, 2011, IEEE I CONF COMP VIS, P1543, DOI 10.1109/ICCV.2011.6126413

[10] Cascading Scene and Viewpoint Feature Learning for Pedestrian Gender Recognition [J].

Cai, Lei ;

Zeng, Huanqiang ;

Zhu, Jianqing ;

Cao, Jiuwen ;

Wang, Yongtao ;

Ma, Kai-Kuang .

IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (04) :3014-3026

← 1 2 3 4 →