Unsupervised Video-Based Action Recognition With Imagining Motion and Perceiving Appearance

被引:8
作者
Lin, Wei [1 ]
Liu, Xiaoyu [1 ]
Zhuang, Yihong [1 ]
Ding, Xinghao [2 ]
Tu, Xiaotong [1 ]
Huang, Yue [2 ]
Zeng, Huanqiang [3 ,4 ]
机构
[1] Xiamen Univ, Sch Informat, Xiamen 361005, Peoples R China
[2] Xiamen Univ, Sch Informat, Inst Artificial Intelligence, Xiamen 361005, Peoples R China
[3] Huaqiao Univ, Sch Engn, Quanzhou 362021, Peoples R China
[4] Huaqiao Univ, Sch Informat Sci & Engn, Quanzhou 362021, Peoples R China
基金
中国国家自然科学基金;
关键词
Videos; Feature extraction; Image recognition; Task analysis; Character recognition; Data mining; Electronic mail; Action recognition; unsupervised; imagine; NETWORKS;
D O I
10.1109/TCSVT.2022.3221280
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video-based action recognition is a challenging task, which demands carefully considering the temporal property of videos in addition to the appearance attributes. Particularly, the temporal domain of raw videos usually contains significantly more redundant or irrelevant information than still images. For that, this paper proposes an unsupervised video-based action recognition approach with imagining motion and perceiving appearance, called IMPA, by comprehensively learning the spatio-temporal characteristics inherited in videos, with a particular emphasis on the moving object for action recognition. Specifically, a self-supervised Motion Extracting Block (MEB) is designed to extract the principal motion features by focusing on the large movement of the moving object, based on the observation that humans can infer complete motion trajectories from partial moving objects. To further take the indispensable appearance attribute in videos into account, an unsupervised Appearance Learning Block (ALB) is developed to perceive the static appearance, thus in combination with the MEB to recognize actions. Extensive validation experiments and ablation studies on multiple datasets demonstrate that our proposed IMPA approach obtains superior performance and surpasses other classical and state-of-the-art unsupervised action recognition methods.
引用
收藏
页码:2245 / 2258
页数:14
相关论文
共 75 条
[1]  
Abadi M., 2015, TensorFlow: Large-scale machine Learning on heterogeneous distributed systems, DOI DOI 10.48550/ARXIV.1603.04467
[2]  
Ahsan U, 2017, PROC IEEE C COMPUT V
[3]   Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition [J].
Ahsan, Unaiza ;
Madhok, Rishi ;
Essa, Irfan .
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :179-189
[4]   Unsupervised Video Representation Learning by Bidirectional Feature Prediction [J].
Behrmann, Nadine ;
Gall, Juergen ;
Noroozi, Mehdi .
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, :1669-1678
[5]   Club Ideas and Exertions: Aggregating Local Predictions for Action Recognition [J].
Cao, Congqi ;
Li, Jiakang ;
Xi, Runping ;
Zhang, Yanning .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) :2247-2259
[6]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[7]  
Chen T, 2020, PR MACH LEARN RES, V119
[8]   Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a Video [J].
Cho, Hyeon ;
Kim, Taehoon ;
Chang, Hyung Jin ;
Hwang, Wonjun .
IEEE ACCESS, 2021, 9 :79562-79571
[9]   Where and How to Transfer: Knowledge Aggregation-Induced Transferability Perception for Unsupervised Domain Adaptation [J].
Dong, Jiahua ;
Cong, Yang ;
Sun, Gan ;
Fang, Zhen ;
Ding, Zhengming .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) :1664-1681
[10]   What Can Be Transferred: Unsupervised Domain Adaptation for Endoscopic Lesions Segmentation [J].
Dong, Jiahua ;
Cong, Yang ;
Sun, Gan ;
Zhong, Bineng ;
Xu, Xiaowei .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4022-4031