RASNet: A Reinforcement Assistant Network for Frame Selection in Video-based Posture Recognition

被引:0
作者
Hu, Ruotong [1 ]
Wang, Xianzhi [2 ]
Chang, Xiaojun [2 ]
Hu, Yeqi [1 ]
Xin, Xiaowei [1 ]
Ding, Xiangqian [1 ]
Guo, Baoqi [3 ]
机构
[1] Ocean Univ China, Fac Informat Sci & Engn, Qingdao, Peoples R China
[2] Univ Technol Sydney, Fac Engn & Informat Technol, Sydney, NSW, Australia
[3] NEWSTAR Software & Consulting CO LTD, Qingdao, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年
关键词
Video-based posture recognition; Frame selection strategy; Policy-based reinforcement learning;
D O I
10.1109/ICME55011.2023.00366
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most existing video-based posture recognition methods treat frames equally using unified or random sampling strategies, thus losing the temporal relationship information among frames. To address this problem, we propose a lightweight framework, namely RASNet, to adaptively select informative frames for recognition. Specifically, we design a video-suited exploration environment to guide the agent in learning the selection strategy. We introduce the reparametrization method to convert the discrete action space into a continuous space, making the agent robust and random. For the reward part, we design a multi-factor function to reward the agent keeping a balance between frame usage and accuracy. Extensive experiments on three large-scale datasets prove the effectiveness of RASNet, e.g., achieving 85.9% accuracy with fewer 1.15 frames than other state-of-the-art methods on Kinetics 600.
引用
收藏
页码:2141 / 2146
页数:6
相关论文
共 27 条
[1]   ViViT: A Video Vision Transformer [J].
Arnab, Anurag ;
Dehghani, Mostafa ;
Heigold, Georg ;
Sun, Chen ;
Lucic, Mario ;
Schmid, Cordelia .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826
[2]  
Bertasius G, 2021, PR MACH LEARN RES, V139
[3]  
Carreira J., 2018, arXiv
[4]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[5]   PoTion: Pose MoTion Representation for Action Recognition [J].
Choutas, Vasileios ;
Weinzaepfel, Philippe ;
Revaud, Jerome ;
Schmid, Cordelia .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7024-7033
[6]   Multiscale Vision Transformers [J].
Fan, Haoqi ;
Xiong, Bo ;
Mangalam, Karttikeya ;
Li, Yanghao ;
Yan, Zhicheng ;
Malik, Jitendra ;
Feichtenhofer, Christoph .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6804-6815
[7]   The "something something" video database for learning and evaluating visual common sense [J].
Goyal, Raghav ;
Kahou, Samira Ebrahimi ;
Michalski, Vincent ;
Materzynska, Joanna ;
Westphal, Susanne ;
Kim, Heuna ;
Haenel, Valentin ;
Fruend, Ingo ;
Yianilos, Peter ;
Mueller-Freitag, Moritz ;
Hoppe, Florian ;
Thurau, Christian ;
Bax, Ingo ;
Memisevic, Roland .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5843-5851
[8]   Single-Stage is Enough: Multi-Person Absolute 3D Pose Estimation [J].
Jin, Lei ;
Xu, Chenyang ;
Wang, Xiaojuan ;
Xiao, Yabo ;
Guo, Yandong ;
Nie, Xuecheng ;
Zhao, Jian .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13076-13085
[9]   NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention [J].
Ju, Yakun ;
Shi, Boxin ;
Jian, Muwei ;
Qi, Lin ;
Dong, Junyu ;
Lam, Kin-Man .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (12) :3014-3034
[10]   Learning conditional photometric stereo with high-resolution features [J].
Ju, Yakun ;
Peng, Yuxin ;
Jian, Muwei ;
Gao, Feng ;
Dong, Junyu .
COMPUTATIONAL VISUAL MEDIA, 2022, 8 (01) :105-118