Recognition of varying size scene images using semantic analysis of deep activation maps

被引：4

作者：

Gupta, Shikha ^{[1
]}

Dileep, A. D. ^{[1
]}

Thenkanidiyoor, Veena ^{[2
]}

机构：

[1] Indian Inst Technol Mandi, Sch Comp & Elect Engn, Kamand 175001, HP, India

[2] Natl Inst Technol Goa, Dept Comp Sci & Engn, Ponda 403401, Goa, India

来源：

MACHINE VISION AND APPLICATIONS | 2021年 / 32卷 / 02期

关键词：

Varying size scene images; Scene representation; Semantic multinomial representation; Concept modeling; Pseudo-concept; Scene recognition; REPRESENTATION; FEATURES;

D O I：

10.1007/s00138-021-01168-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Understanding the complex semantic structure of scene images requires mapping the image from pixel space to high-level semantic space. In semantic space, a scene image is represented by the posterior probabilities of concepts (e.g., `car,' `chair,' `window,' etc.) present in it and such representation is known as semantic multinomial ( SMN) representation. SMNgeneration requires a concept annotated dataset for concept modeling which is infeasible to generate manually due to the large size of databases. To tackle this issue, we propose a novel approach of building the concept model via pseudo-concepts. Pseudoconcept acts as a proxy for the actual concept and gives the cue for its presence instead of actual identity. We propose to use filter responses from deeper convolutional layers of convolutional neural networks (CNNs) as pseudo-concepts, as filters in deeper convolutional layers are trained for different semantic concepts. Most of the prior work considers fixed-size (approximate to 227x227) images for semantic analysis which suppressesmany concepts present in the images. In thiswork, we preserve the true-concept structure in images by passing in their original resolution to convolutional layers of CNNs. We further propose to prune the non-prominent pseudo-concepts, group the similar one using kernel clustering and later model them using a dynamic-based support vector machine. We demonstrate that resulting SMN representation indeed captures the semantic concepts better and results in state-of-the-art classification accuracy on varying size scene image datasets such as MIT67 and SUN397.

引用

页数：19

共 46 条

[21] ImageNet Classification with Deep Convolutional Neural Networks [J].

Krizhevsky, Alex ;

Sutskever, Ilya ;

Hinton, Geoffrey E. .

COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90

[22] Object Bank: An Object-Level Image Representation for High-Level Visual Recognition [J].

Li, Li-Jia ;

Su, Hao ;

Lim, Yongwhan ;

Li Fei-Fei .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 107 (01) :20-39

[23]

Li Ping, 2013, ADV NEURAL INFORM PR, P2571

[24] Mining Mid-level Visual Patterns with Deep CNN Activations [J].

Li, Yao ;

Liu, Lingqiao ;

Shen, Chunhua ;

van den Hengel, Anton .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 121 (03) :344-364

[25]

Liu C, 2009, PROC CVPR IEEE, P1972, DOI 10.1109/CVPRW.2009.5206536

[26] Distinctive image features from scale-invariant keypoints [J].

Lowe, DG .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2004, 60 (02) :91-110

[27] Modeling the shape of the scene: A holistic representation of the spatial envelope [J].

Oliva, A ;

Torralba, A .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2001, 42 (03) :145-175

[28]

Quattoni A, 2009, PROC CVPR IEEE, P413, DOI 10.1109/CVPRW.2009.5206537

[29] Bridging the gap: Query by semantic example [J].

Rasiwasia, Nikhil ;

Moreno, Pedro J. ;

Vasconcelos, Nuno .

IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (05) :923-938

[30] Holistic Context Models for Visual Recognition [J].

Rasiwasia, Nikhil ;

Vasconcelos, Nuno .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (05) :902-917

← 1 2 3 4 5 →