Deep Imbalanced Attribute Classification Using Visual Attention Aggregation

被引：148

作者：

Sarafianos, Nikolaos ^{[1
]}

Xu, Xiang ^{[1
]}

Kakadiaris, Ioannis A. ^{[1
]}

机构：

[1] Univ Houston, Computat Biomed Lab, Houston, TX 77004 USA

来源：

COMPUTER VISION - ECCV 2018, PT XI | 2018年 / 11215卷

关键词：

Visual attributes; Deep imbalanced learning; Visual attention;

D O I：

10.1007/978-3-030-01252-6_42

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For many computer vision applications, such as image description and human identification, recognizing the visual attributes of humans is an essential yet challenging problem. Its challenges originate from its multi-label nature, the large underlying class imbalance and the lack of spatial annotations. Existing methods follow either a computer vision approach while failing to account for class imbalance, or explore machine learning solutions, which disregard the spatial and semantic relations that exist in the images. With that in mind, we propose an effective method that extracts and aggregates visual attention masks at different scales. We introduce a loss function to handle class imbalance both at class and at an instance level and further demonstrate that penalizing attention masks with high prediction variance accounts for the weak supervision of the attention mechanism. By identifying and addressing these challenges, we achieve state-of-the-art results with a simple attention mechanism in both PETA and WIDER-Attribute datasets without additional context or side information.

引用

页码：708 / 725

页数：18

共 57 条

[1] Fashion Forward: Forecasting Visual Style in Fashion [J].

Al-Halah, Ziad ;

Stiefelhagen, Rainer ;

Grauman, Kristen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :388-397

[2]

[Anonymous], 2016, ARXIV160306765

[3]

[Anonymous], 2017, ARXIV171107246

[4]

[Anonymous], 2017, CoRR

[5]

[Anonymous], 2007, P ADV NEUR INF PROC

[6]

[Anonymous], 2016, ARXIV161105603

[7]

[Anonymous], 2016, ARXIV160307054

[8] PCANet: A Simple Deep Learning Baseline for Image Classification? [J].

Chan, Tsung-Han ;

Jia, Kui ;

Gao, Shenghua ;

Lu, Jiwen ;

Zeng, Zinan ;

Ma, Yi .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :5017-5032

[9]

Chang HS, 2017, ADV NEUR IN, P1003

[10] SMOTE: Synthetic minority over-sampling technique [J].

Chawla, Nitesh V. ;

Bowyer, Kevin W. ;

Hall, Lawrence O. ;

Kegelmeyer, W. Philip .

2002, American Association for Artificial Intelligence (16)

← 1 2 3 4 5 6 →