Exploiting spatial-temporal context for trajectory based action video retrieval

被引：0

作者：

Lelin Zhang

Zhiyong Wang

Tingting Yao

Shin’ichi Staoh

Tao Mei

David Dagan Feng

机构：

[1] The University of Sydney,School of Information Technologies

[2] Hefei University of Technology,School of Computer and Information

[3] National Institute of Informatics,undefined

[4] Microsoft Research,undefined

来源：

Multimedia Tools and Applications | 2018年 / 77卷

关键词：

Spatial-temporal information; Descriptor coding; Trajectory matching; Bag-of-visual-words; Action video retrieval;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Retrieving videos with similar actions is an important task with many applications. Yet it is very challenging due to large variations across different videos. While the state-of-the-art approaches generally utilize the bag-of-visual-words representation with the dense trajectory feature, the spatial-temporal context among trajectories is overlooked. In this paper, we propose to incorporate such information into the descriptor coding and trajectory matching stages of the retrieval pipeline. Specifically, to capture the spatial-temporal correlations among trajectories, we develop a descriptor coding method based on the correlation between spatial-temporal and feature aspects of individual trajectories. To deal with the mis-alignments between dense trajectory segments, we develop an offset-aware distance measure for improved trajectory matching. Our comprehensive experimental results on two popular datasets indicate that the proposed method improves the performance of action video retrieval, especially on more dynamic actions with significant movements and cluttered backgrounds.

引用

页码：2057 / 2081

页数：24

共 79 条

[1] Bashir F(2007)Real-time motion trajectory-based indexing and retrieval of video sequences IEEE Trans Multimed 9 58-65
[2] Khokhar A(2013)Mining spatiotemporal video patterns towards robust action retrieval Neurocomputing 105 61-69
[3] Schonfeld D(2006)Feature-based sequence-to-sequence matching Int J Comput Vis 68 53-64
[4] Cao L(1981)Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography Commun ACM 24 381-395
[5] Ji R(2007)Semantic-based surveillance video retrieval IEEE Trans Image Process 16 1168-1181
[6] Gao Y(2013)Content-based retrieval of human actions from realistic video databases Inf Sci 236 56-65
[7] Liu W(2005)On space-time interest points Int J Comput Vis 64 107-123
[8] Tian Q(2011)Contextual bag-of-words for visual categorization IEEE Trans Circ Syst Video Technol 21 381-392
[9] Caspi Y(2016)Benchmarking a multimodal and multiview and interactive dataset for human action recognition IEEE Trans Cybern PP 1-14
[10] Simakov D(2016)Learning spatio-temporal representations for action recognition: A genetic programming approach IEEE Trans Cybern 46 158-170

← 1 2 3 4 5 6 7 8 →