Spatio-temporal human action localization in indoor surveillances

被引:3
|
作者
Liu, Zihao [1 ]
Yan, Danfeng [1 ]
Cai, Yuanqiang [1 ]
Song, Yan [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, Natl Pilot Software Engn Sch, Beijing 100876, Peoples R China
[2] Shanghai Int Studies Univ, Sch Business & Management, Shanghai 200083, Peoples R China
关键词
Video analysis; Spatio-temporal action localization dataset; Real-world indoor surveillance;
D O I
10.1016/j.patcog.2023.110087
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatio-temporal action localization is a crucial and challenging task in the field of video understanding. Existing benchmarks for spatio-temporal action detection are limited by factors such as incomplete annotations, highlevel non-universal actions, and uncommon scenarios. To address these limitations and facilitate research in real-world security applications, we introduce a novel human-centric dataset for spatio-temporal localization of atomic actions in indoor surveillance settings, termed as HIA (Human-centric Indoor Actions). The HIA dataset is constructed by selecting 30 atomic action classes, compiling 100 surveillance videos, and annotating 219,225 frames with 370,937 bounding boxes. The primary characteristics of HIA include (1) accurate spatiotemporal annotations for atomic actions, (2) human-centric annotations at the frame level, (3) temporal linking of persons across discontinuous tracks, and (4) utilization of indoor surveillance videos. Our HIA, with its realistic settings in indoor surveillance scenes and comprehensive annotations, presents a valuable and novel challenge to the spatio-temporal action localization domain. To establish a benchmark, we evaluate various methods and provide an in-depth analysis of the HIA dataset. The HIA dataset will be made available soon, and we anticipate that it will serve as a standard and practical benchmark for the research community.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Leveraging Multimodal Knowledge for Spatio-temporal Action Localization<bold> </bold>
    Chen, Keke
    Tu, Zhewei
    Shu, Xiangbo
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024, 2024,
  • [22] Real-time Spatio-Temporal Action Localization in 360 Videos
    Chen, Bo
    Ali-Eldin, Ahmed
    Shenoy, Prashant
    Nahrsted, Klara
    2020 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2020), 2020, : 73 - 76
  • [23] Video action re-localization using spatio-temporal correlation
    Ramaswamy, Akshaya
    Seemakurthy, Karthik
    Gubbi, Jayayardhana
    Balamuralidhar, P.
    2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 192 - 201
  • [24] Spatio-temporal hybrid Anderson localization
    Lobanov, V. E.
    Borovkova, O. V.
    Kartashov, Y. V.
    Szameit, A.
    EPL, 2014, 108 (06)
  • [25] Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos
    Duta, Ionut C.
    Ionescu, Bogdan
    Aizawa, Kiyoharu
    Sebe, Nicu
    MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 365 - 378
  • [26] Hierarchical and Spatio-Temporal Sparse Representation for Human Action Recognition
    Tian, Yi
    Kong, Yu
    Ruan, Qiuqi
    An, Gaoyun
    Fu, Yun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (04) : 1748 - 1762
  • [27] Human Action Recognition Based on a Spatio-Temporal Video Autoencoder
    Sousa e Santos, Anderson Carlos
    Pedrini, Helio
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (11)
  • [28] Spatio-temporal SIFT and Its Application to Human Action Classification
    Al Ghamdi, Manal
    Zhang, Lei
    Gotoh, Yoshihiko
    COMPUTER VISION - ECCV 2012: WORKSHOPS AND DEMONSTRATIONS, PT I, 2012, 7583 : 301 - 310
  • [29] Spatio-Temporal Information Fusion and Filtration for Human Action Recognition
    Zhang, Man
    Li, Xing
    Wu, Qianhan
    SYMMETRY-BASEL, 2023, 15 (12):
  • [30] Bag of Spatio-temporal Synonym Sets for Human Action Recognition
    Pang, Lin
    Cao, Juan
    Guo, Junbo
    Lin, Shouxun
    Song, Yan
    ADVANCES IN MULTIMEDIA MODELING, PROCEEDINGS, 2010, 5916 : 422 - 432