Dynamic Scene Classification Using Redundant Spatial Scenelets

被引：14

作者：

Du, Liang ^{[1
]}

Ling, Haibin ^{[1
]}

机构：

[1] Temple Univ, Dept Comp & Informat Sci, Philadelphia, PA 19111 USA

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2016年 / 46卷 / 09期

基金：

美国国家科学基金会;

关键词：

Dynamic scene; redundant spatial grouping (RSG);

D O I：

10.1109/TCYB.2015.2466692

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Dynamic scene classification started drawing an increasing amount of research efforts recently. While existing arts mainly rely on low-level features, little work addresses the need of exploring the rich spatial layout information in dynamic scene. Motivated by the fact that dynamic scenes are characterized by both dynamic and static parts with spatial layout priors, we propose to use redundant spatial grouping of a large number of spatiotemporal patches, named scenelet, to represent a dynamic scene. Specifically, each scenelet is associated with a category-dependent scenelet model to encode the likelihood of a specific scene category. All scenelet models for a scene category are jointly learned to encode the spatial interactions and redundancies among them. Subsequently, a dynamic scene sequence is represented as a collection of category likelihoods estimated by these scenelet models. Such presentation effectively encodes the spatial layout prior together with associated semantic information, and can be used for classifying dynamic scenes in combination with a standard learning algorithm such as k-nearest neighbor or linear support vector machine. The effectiveness of our approach is clearly demonstrated using two dynamic scene benchmarks and a related application for violence video classification. In the nearest neighbor classification framework, for dynamic scene classification, our method outperforms previous state-of-the- arts on both Maryland "in the wild" dataset and "stabilized" dynamic scene dataset. For violence video classification on a benchmark dataset, our method achieves a promising classification rate of 87.08%, which significantly improves previous best result of 81.30%.

引用

页码：2156 / 2165

页数：10

共 60 条

[1]

[Anonymous], 2003, P NIPS

[2]

[Anonymous], P 2009 IEEE C COMPUT, DOI DOI 10.1109/CVPR.2009.5206557

[3]

[Anonymous], 2010, TECH REP

[4]

[Anonymous], 2012, MALSAR MULTITASK LEA

[5]

[Anonymous], 2010, ADV NEURAL PROCESSIN

[6]

[Anonymous], P BRIT MACH VIS C

[7]

Argyriou A., 2007, Advances in Neural Information Processing Systems, P41

[8] Exploiting Textons Distributions on Spatial Hierarchy for Scene Classification [J].

Battiato, S. ;

Farinella, G. M. ;

Gallo, G. ;

Ravi, D. .

EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2010,

[9] Scene classification using a hybrid generative/discriminative approach [J].

Bosch, Anna ;

Zisserman, Andrew ;

Munoz, Xavier .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (04) :712-727

[10] Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations [J].

Bourdev, Lubomir ;

Malik, Jitendra .

2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, :1365-1372

← 1 2 3 4 5 6 →