Classification of Cinematographic Shots Using Lie Algebra and its Application to Complex Event Recognition

被引:21
作者
Bhattacharya, Subhabrata [1 ]
Mehran, Ramin [2 ]
Sukthankar, Rahul [3 ]
Shah, Mubarak [1 ]
机构
[1] Univ Cent Florida, Ctr Res Comp Vis, Orlando, FL 32826 USA
[2] Microsoft Corp, Redmond, WA 98052 USA
[3] Google Res, Mountain View, CA 94043 USA
关键词
Cinematographic shots; homography; lie algebra; multimedia event recognition; shot classification; CAMERA MOTION PARAMETERS; QUALITATIVE ESTIMATION; VIDEO;
D O I
10.1109/TMM.2014.2300833
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a discriminative representation of a video shot based on its camera motion and demonstrate how the representation can be used for high level multimedia tasks like complex event recognition. In our technique, we assume that a homography exists between a pair of subsequent frames in a given shot. Using purely image-based methods, we compute homography parameters that serve as coarse indicators of the ambient camera motion. Next, using Lie algebra, we map the homography matrices to an intermediate vector space that preserves the intrinsic geometric structure of the transformation. The mappings are stacked temporally to generate vector time-series per shot. To extract meaningful features from time-series, we propose an efficient linear dynamical system based technique. The extracted temporal features are further used to train linear SVMs as classifiers for a particular shot class. In addition to demonstrating the efficacy of our method on a novel dataset, we extend its applicability to recognize complex events in large scale videos under unconstrained scenarios. Our empirical evaluations on eight cinematographic shot classes show that our technique performs close to approaches that involve extraction of 3-D trajectories using computationally prohibitive structure from motion techniques.
引用
收藏
页码:686 / 696
页数:11
相关论文
共 32 条
[1]  
Arijon D., 1976, Grammar of the film language
[2]   SURF: Speeded up robust features [J].
Bay, Herbert ;
Tuytelaars, Tinne ;
Van Gool, Luc .
COMPUTER VISION - ECCV 2006 , PT 1, PROCEEDINGS, 2006, 3951 :404-417
[3]  
Bhattacharya S., 2006, P ACM MULT, P361
[4]  
Bhattacharya S, 2011, AUGMENT VIS REAL, V1, P221, DOI 10.1007/978-3-642-11568-4_10
[5]  
DIMONTE CL, 1990, INT CONF ACOUST SPEE, P2539, DOI 10.1109/ICASSP.1990.116119
[6]   Optimal content-based video decomposition for interactive video navigation [J].
Doulamis, AD ;
Doulamis, ND .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2004, 14 (06) :757-775
[7]   Application of Lie algebras to visual servoing [J].
Drummond, T ;
Cipolla, R .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2000, 37 (01) :21-41
[8]  
Ekenel H. K., 2010, P INT WORKSH AUT INF
[9]   Nonparametric motion characterization using causal probabilistic models for video indexing and retrieval [J].
Fablet, R ;
Bouthemy, P ;
Pérez, P .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2002, 11 (04) :393-407
[10]   ClassView:: Hierarchical video shot classification, indexing, and accessing [J].
Fan, JP ;
Elmagarmid, AK ;
Zhu, XQ ;
Aref, WG ;
Wu, LD .
IEEE TRANSACTIONS ON MULTIMEDIA, 2004, 6 (01) :70-86