A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos

被引:15
|
作者
Gleason, Joshua [1 ]
Ranjan, Rajeev [1 ]
Schwarcz, Steven [1 ]
Castillo, Carlos D. [1 ]
Chen, Jun-Cheng [1 ]
Chellappa, Rama [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
来源
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2019年
关键词
D O I
10.1109/WACV.2019.00021
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Existing approaches for spatio-temporal action detection in videos are limited by the spatial extent and temporal duration of the actions. In this paper, we present a modular system for spatio-temporal action detection in untrimmed security videos. We propose a two stage approach. The first stage generates dense spatio-temporal proposals using hierarchical clustering and temporal jittering techniques on frame-wise object detections. The second stage is a Temporal Refinement I3D (TRI-3D) network that performs action classification and temporal refinement on the generated proposals. The object detection-based proposal generation step helps in detecting actions occurring in a small spatial region of a video frame, while temporal jittering and refinement helps in detecting actions of variable lengths. Experimental results on the spatio-temporal action detection dataset - DIVA - show the effectiveness of our system. For comparison, the performance of our system is also evaluated on the THUMOS'14 temporal action detection dataset.
引用
收藏
页码:141 / 150
页数:10
相关论文
共 50 条
  • [1] Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
    Song, Yeongtaek
    Kim, Incheol
    SENSORS, 2019, 19 (05)
  • [2] Spatio-Temporal Activity Detection and Recognition in Untrimmed Surveillance Videos
    Gkountakos, Konstantinos
    Touska, Despoina
    Ioannidis, Konstantinos
    Tsikrika, Theodora
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 451 - 455
  • [3] JOINT SPATIO-TEMPORAL ACTION LOCALIZATION IN UNTRIMMED VIDEOS WITH PER-FRAME SEGMENTATION
    Duan, Xuhuan
    Wang, Le
    Zhai, Changbo
    Zhang, Qilin
    Niu, Zhenxing
    Zheng, Nanning
    Hua, Gang
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 918 - 922
  • [4] Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation
    Wang, Le
    Duan, Xuhuan
    Zhang, Qilin
    Niu, Zhenxing
    Hua, Gang
    Zheng, Nanning
    SENSORS, 2018, 18 (05)
  • [5] Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization
    Ghamsarian, Negin
    Taschwer, Mario
    Putzgruber-Adamitsch, Doris
    Sarny, Stephanie
    Schoeffmann, Klaus
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10720 - 10727
  • [6] Video Imprint Segmentation for Temporal Action Detection in Untrimmed Videos
    Gao, Zhanning
    Wang, Le
    Zhang, Qilin
    Niu, Zhenxing
    Zheng, Nanning
    Hua, Gang
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8328 - 8335
  • [7] Multi-Instance Multi-Label Action Recognition and Localization Based on Spatio-Temporal Pre-Trimming for Untrimmed Videos
    Zhang, Xiao-Yu
    Shi, Haichao
    Li, Changsheng
    Li, Peng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12886 - 12893
  • [8] Temporal Action Detection in Untrimmed Videos from Fine to Coarse Granularity
    Yao, Guangle
    Lei, Tao
    Liu, Xianyuan
    Jiang, Ping
    APPLIED SCIENCES-BASEL, 2018, 8 (10):
  • [9] Improved Spatio-temporal Action Localization for Surveillance Videos
    Liang, Morgan
    Li, Xun
    Onie, Sandersan
    Larsen, Mark
    Sowmya, Arcot
    2021 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2021), 2021, : 147 - 154
  • [10] Activity-driven Weakly-Supervised Spatio-Temporal Grounding from Untrimmed Videos
    Chen, Junwen
    Bao, Wentao
    Kong, Yu
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3789 - 3797