Cluster-guided temporal modeling for action recognition

被引:0
|
作者
Kim, Jeong-Hun [1 ]
Hao, Fei [2 ]
Leung, Carson Kai-Sang [3 ]
Nasridinov, Aziz [1 ]
机构
[1] Chungbuk Natl Univ, Dept Comp Sci, Cheongju 28644, South Korea
[2] Shaanxi Normal Univ, Sch Comp Sci, Xian 710119, Peoples R China
[3] Univ Manitoba, Dept Comp Sci, Winnipeg, MB R3T 2N2, Canada
基金
新加坡国家研究基金会;
关键词
Keyframe selection; Temporal modeling; Temporal redundancy; Data clustering; Action recognition;
D O I
10.1007/s13735-023-00280-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition is a video understanding task that is carried out to recognize an action of an object in a video. In order to recognize the action, it is necessary to extract motion information through temporal modeling. However, videos typically contain high temporal redundancy, such as iterative events and adjacent frames. This high temporal redundancy weakens information related to actual action, making it difficult for the final classifier to recognize the action. In this article, we focus on preserving helpful information for action recognition by reducing the high temporal redundancy in videos. To achieve this goal, we propose a novel frame selection method called cluster-guided frame selection (CluFrame). Specifically, CluFrame compresses an input video into keyframes of clusters discovered by applying k-means clustering to frame-wise features extracted from pre-trained 2D-CNNs in the temporal compression (TC) module. In addition, CluFrame selects keyframes related to the action of the input video by optimizing the TC module based on the action recognition results. Experimental results on five benchmark datasets demonstrate that CluFrame addresses the high temporal redundancy in the video and achieves action recognition accuracy improvement over existing action recognition methods by up to 6.6% and by about 0.7% compared to state-of-the-art frame selection methods.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] LONG-SHORT TEMPORAL MODELING FOR EFFICIENT ACTION RECOGNITION
    Wu, Liyu
    Zou, Yuexian
    Zhang, Can
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2435 - 2439
  • [22] Hierarchical Spatio-Temporal Context Modeling for Action Recognition
    Sun, Ju
    Wu, Xiao
    Yan, Shuicheng
    Cheong, Loong-Fah
    Chua, Tat-Seng
    Li, Jintao
    CVPR: 2009 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-4, 2009, : 2004 - +
  • [23] Energy-Guided Temporal Segmentation Network for Multimodal Human Action Recognition
    Liu, Qiang
    Chen, Enqing
    Gao, Lei
    Liang, Chengwu
    Liu, Hao
    SENSORS, 2020, 20 (17) : 1 - 17
  • [24] StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
    He, Dongliang
    Zhou, Zhichao
    Gan, Chuang
    Li, Fu
    Liu, Xiao
    Li, Yandong
    Wang, Limin
    Wen, Shilei
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8401 - 8408
  • [25] Revisiting the Spatial and Temporal Modeling for Few-Shot Action Recognition
    Xing, Jiazheng
    Wang, Mengmeng
    Liu, Yong
    Mu, Boyu
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3001 - 3009
  • [26] Cluster-guided denoising graph auto-encoder for enhanced traffic data imputation and fault detection
    Huang, Yongcan
    Zhen, Hao
    Yang, Jidong J.
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 261
  • [27] Learning Action-guided Spatio-temporal Transformer for Group Activity Recognition
    Li, Wei
    Yang, Tianzhao
    Wu, Xiao
    Du, Xian-Jun
    Qiao, Jian-Jun
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2051 - 2060
  • [28] Action Recognition with Temporal Relationships
    Cheng, Guangchun
    Wan, Yiwen
    Santiteerakul, Wasana
    Tang, Shijun
    Buckles, Bill P.
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2013, : 671 - 675
  • [29] A BERT-Based Joint Channel-Temporal Modeling for Action Recognition
    Yang, Man
    Gan, Lipeng
    Cao, Runze
    Li, Xiaochao
    IEEE SENSORS JOURNAL, 2023, 23 (19) : 23765 - 23779
  • [30] Modeling spatio-temporal layout with Lie Algebrized Gaussians for action recognition
    Chen, Meng
    Gong, Liyu
    Wang, Tianjiang
    Liu, Fang
    Feng, Qi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (17) : 10335 - 10355