Multilabel Image Classification With Regional Latent Semantic Dependencies

被引：88

作者：

Zhang, Junjie ^{[1
,2
]}

Wu, Qi ^{[3
,4
]}

Shen, Chunhua ^{[3
,4
]}

Zhang, Jian ^{[2
]}

Lu, Jianfeng ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Jiangsu, Peoples R China

[2] Univ Technol Sydney, Fac Engn & Informat Technol, Sydney, NSW 2007, Australia

[3] Univ Adelaide, Australia Ctr Robot Vis, Adelaide, SA 5005, Australia

[4] Univ Adelaide, Sch Comp Sci, Adelaide, SA 5005, Australia

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2018年 / 20卷 / 10期

关键词：

Multilabel image classification; semantic dependence; deep neural network; ANNOTATION; GRADIENTS;

D O I：

10.1109/TMM.2018.2812605

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep convolution neural networks (CNNs) have demonstrated advanced performance on single-label image classification, and various progress also has been made to apply CNN methods on multilabel image classification, which requires annotating objects, attributes, scene categories, etc., in a single shot. Recent state-of-the-art approaches to the multilabel image classification exploit the label dependencies in an image, at the global level, largely improving the labeling capacity. However, predicting small objects and visual concepts is still challenging due to the limited discrimination of the global visual features. In this paper, we propose a regional latent semantic dependencies model (RLSD) to address this problem. The utilized model includes a fully convolutional localization architecture to localize the regions that may contain multiple highly dependent labels. The localized regions are further sent to the recurrent neural networks to characterize the latent semantic dependencies at the regional level. Experimental results on several benchmark datasets show that our proposed model achieves the best performance compared to the state-of-the-art models, especially for predicting small objects occurring in the images. Also, we set up an upper bound model (RLSD+ft-RPN) using bounding-box coordinates during training, and the experimental results also show that our RLSD can approach the upper bound without using the bounding-box annotations, which is more realistic in the real world.

引用

页码：2801 / 2813

页数：13

共 57 条

[1] Measuring the Objectness of Image Windows [J].

Alexe, Bogdan ;

Deselaers, Thomas ;

Ferrari, Vittorio .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (11) :2189-2202

[2]

[Anonymous], 2015, Proc. Advances in Neural Inf. Process. Syst

[3]

[Anonymous], PROC CVPR IEEE

[4]

[Anonymous], 2009, ACM INT C IM VID RET

[5]

[Anonymous], 2010, P ACM SIGKDD

[6]

[Anonymous], 1997, Neural Computation

[7]

[Anonymous], IEEE T PATTERN ANAL

[8]

[Anonymous], 2005, PROC 14 ACM INT C I, DOI DOI 10.1145/1099554.1099591

[9]

[Anonymous], P ADV NEUR INF PROC

[10]

[Anonymous], 2011, ACM T INTEL SYST TEC, DOI DOI 10.1145/1961189.1961199

← 1 2 3 4 5 6 →