Zeus: Efficiently Localizing Actions in Videos using Reinforcement Learning

被引：5

作者：

Chunduri, Pramod ^{[1
]}

Bang, Jaeho ^{[1
]}

Lu, Yao ^{[2
]}

Arulraj, Joy ^{[1
]}

机构：

[1] Georgia Inst Technol, Atlanta, GA 30332 USA

[2] Microsoft Res, Redmond, WA USA

来源：

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22) | 2022年

关键词：

video analytics; video database management systems; action localization; reinforcement learning;

D O I：

10.1145/3514221.3526181

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Detection and localization of actions in videos is an important problem in practice. State-of-the-art video analytics systems are unable to efficiently and effectively answer such action queries because actions often involve a complex interaction between objects and are spread across a sequence of frames; detecting and localizing them requires computationally expensive deep neural networks. It is also important to consider the entire sequence of frames to answer the query effectively. In this paper, we present ZEUS, a video analytics system tailored for answering action queries. We present a novel technique for efficiently answering these queries using deep reinforcement learning. ZEUS trains a reinforcement learning agent that learns to adaptively modify the input video segments that are subsequently sent to an action classification network. The agent alters the input segments along three dimensions - sampling rate, segment length, and resolution. To meet the user-specified accuracy target, ZEUS'S query optimizer trains the agent based on an accuracy-aware, aggregate reward function. Evaluation on three diverse video datasets shows that ZEUS outperforms state-of-the-art frame- and window-based filtering techniques by up to 22.1x and 4.7x, respectively. It also consistently meets the user-specified accuracy target across all queries.

引用

页码：545 / 558

页数：14

共 30 条

[1] [Anonymous], Pytorch torchvision models
[2] MIRIS: Fast Object Track Queries in Video
Bastani, Favyen
He, Songtao
Balasingam, Arjun
Gopalakrishnan, Karthik
Alizadeh, Mohammad
Balakrishnan, Hari
Cafarella, Michael
Kraska, Tim
Madden, Sam
[J]. SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 1907 - 1921
[3] Brockman G, 2016, Arxiv, DOI [arXiv:1606.01540, DOI 10.48550/ARXIV.1606.01540]
[4] Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[5] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[6] Rethinking the Faster R-CNN Architecture for Temporal Action Localization
Chao, Yu-Wei
Vijayanarasimhan, Sudheendra
Seybold, Bryan
Ross, David A.
Deng, Jia
Sukthankar, Rahul
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1130 - 1139
[7] The Cityscapes Dataset for Semantic Urban Scene Understanding
Cordts, Marius
Omran, Mohamed
Ramos, Sebastian
Rehfeld, Timo
Enzweiler, Markus
Benenson, Rodrigo
Franke, Uwe
Roth, Stefan
Schiele, Bernt
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
[8] Histograms of oriented gradients for human detection
Dalal, N
Triggs, B
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
[9] Convolutional Two-Stream Network Fusion for Video Action Recognition
Feichtenhofer, Christoph
Pinz, Axel
Zisserman, Andrew
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
[10] Vision meets robotics: The KITTI dataset
Geiger, A.
Lenz, P.
Stiller, C.
Urtasun, R.
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) : 1231 - 1237

← 1 2 3 →