Weakly supervised graph learning for action recognition in untrimmed video

被引:3
作者
Yao, Xiao [1 ]
Zhang, Jia [1 ]
Chen, Ruixuan [1 ]
Zhang, Dan [2 ]
Zeng, Yifeng [1 ]
机构
[1] Hohai Univ, Coll IoT Engn, Nanjing, Peoples R China
[2] Inner Mongolia Normal Univ, Coll Foreign Languages, Hohhot, Peoples R China
关键词
Action recognition; Weakly supervised; Proposal relations; GCNs;
D O I
10.1007/s00371-022-02673-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Action recognition in real-world scenarios is a challenging task which involves the action localization and classification for untrimmed video. Since the untrimmed video in real scenarios lacks fine annotation, existing supervised learning methods have limited effectiveness and robustness in performance. Moreover, state-of-the-art methods discuss each action proposal individually, ignoring the exploration of semantic relationship between different proposals from continuity of video. To address these issues, we propose a weakly supervised approach to explore the proposal relations using Graph Convolutional Networks (GCNs). Specifically, the method introduces action similarity edges and temporal similarity edges to represent the context semantic relationship between different proposals for graph constructing, and the similarity of action features is used to weakly supervise the spatial semantic relationship between labeled and unlabeled samples to achieve the effective recognition of actions in the video. We validate the effectiveness of the proposed method on public benchmarks for untrimmed video (THUMOS14 and ActivityNet). The experimental results demonstrate that the proposed method in this paper has achieved state-of-the-art results, and achieves better robustness and generalization performance.
引用
收藏
页码:5469 / 5483
页数:15
相关论文
共 49 条
  • [1] Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization
    Alwassel, Humam
    Heilbron, Fabian Caba
    Ghanem, Bernard
    [J]. COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 253 - 269
  • [2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [3] Rethinking the Faster R-CNN Architecture for Temporal Action Localization
    Chao, Yu-Wei
    Vijayanarasimhan, Sudheendra
    Seybold, Bryan
    Ross, David A.
    Deng, Jia
    Sukthankar, Rahul
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1130 - 1139
  • [4] Chen Jianbo, 2018, ARXIV
  • [5] Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning
    Gan, Chuang
    Gong, Boqing
    Liu, Kun
    Su, Hao
    Guibas, Leonidas J.
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5589 - 5597
  • [6] Temporal Context Network for Activity Localization in Videos
    Dai, Xiyang
    Singh, Bharat
    Zhang, Guyue
    Davis, Larry S.
    Chen, Yan Qiu
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5727 - 5736
  • [7] Duan X., 2018, ARXIV
  • [8] End-to-End Learning of Motion Representation for Video Understanding
    Fan, Lijie
    Huang, Wenbing
    Gan, Chuang
    Ermon, Stefano
    Gong, Boqing
    Huang, Junzhou
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6016 - 6025
  • [9] Gao J., 2017, ARXIV
  • [10] TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals
    Gao, Jiyang
    Yang, Zhenheng
    Sun, Chen
    Chen, Kan
    Nevatia, Ram
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3648 - 3656