RASNet: A Reinforcement Assistant Network for Frame Selection in Video-based Posture Recognition

被引：0

作者：

Hu, Ruotong ^{[1
]}

Wang, Xianzhi ^{[2
]}

Chang, Xiaojun ^{[2
]}

Hu, Yeqi ^{[1
]}

Xin, Xiaowei ^{[1
]}

Ding, Xiangqian ^{[1
]}

Guo, Baoqi ^{[3
]}

机构：

[1] Ocean Univ China, Fac Informat Sci & Engn, Qingdao, Peoples R China

[2] Univ Technol Sydney, Fac Engn & Informat Technol, Sydney, NSW, Australia

[3] NEWSTAR Software & Consulting CO LTD, Qingdao, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年

关键词：

Video-based posture recognition; Frame selection strategy; Policy-based reinforcement learning;

D O I：

10.1109/ICME55011.2023.00366

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Most existing video-based posture recognition methods treat frames equally using unified or random sampling strategies, thus losing the temporal relationship information among frames. To address this problem, we propose a lightweight framework, namely RASNet, to adaptively select informative frames for recognition. Specifically, we design a video-suited exploration environment to guide the agent in learning the selection strategy. We introduce the reparametrization method to convert the discrete action space into a continuous space, making the agent robust and random. For the reward part, we design a multi-factor function to reward the agent keeping a balance between frame usage and accuracy. Extensive experiments on three large-scale datasets prove the effectiveness of RASNet, e.g., achieving 85.9% accuracy with fewer 1.15 frames than other state-of-the-art methods on Kinetics 600.

引用

页码：2141 / 2146

页数：6

共 27 条

[1] ViViT: A Video Vision Transformer [J].

Arnab, Anurag ;

Dehghani, Mostafa ;

Heigold, Georg ;

Sun, Chen ;

Lucic, Mario ;

Schmid, Cordelia .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826

[2]

Bertasius G, 2021, PR MACH LEARN RES, V139

[3]

Carreira J., 2018, arXiv

[4] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[5] PoTion: Pose MoTion Representation for Action Recognition [J].

Choutas, Vasileios ;

Weinzaepfel, Philippe ;

Revaud, Jerome ;

Schmid, Cordelia .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7024-7033

[6] Multiscale Vision Transformers [J].

Fan, Haoqi ;

Xiong, Bo ;

Mangalam, Karttikeya ;

Li, Yanghao ;

Yan, Zhicheng ;

Malik, Jitendra ;

Feichtenhofer, Christoph .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6804-6815

[7] The "something something" video database for learning and evaluating visual common sense [J].

Goyal, Raghav ;

Kahou, Samira Ebrahimi ;

Michalski, Vincent ;

Materzynska, Joanna ;

Westphal, Susanne ;

Kim, Heuna ;

Haenel, Valentin ;

Fruend, Ingo ;

Yianilos, Peter ;

Mueller-Freitag, Moritz ;

Hoppe, Florian ;

Thurau, Christian ;

Bax, Ingo ;

Memisevic, Roland .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5843-5851

[8] Single-Stage is Enough: Multi-Person Absolute 3D Pose Estimation [J].

Jin, Lei ;

Xu, Chenyang ;

Wang, Xiaojuan ;

Xiao, Yabo ;

Guo, Yandong ;

Nie, Xuecheng ;

Zhao, Jian .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13076-13085

[9] NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention [J].

Ju, Yakun ;

Shi, Boxin ;

Jian, Muwei ;

Qi, Lin ;

Dong, Junyu ;

Lam, Kin-Man .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (12) :3014-3034

[10] Learning conditional photometric stereo with high-resolution features [J].

Ju, Yakun ;

Peng, Yuxin ;

Jian, Muwei ;

Gao, Feng ;

Dong, Junyu .

COMPUTATIONAL VISUAL MEDIA, 2022, 8 (01) :105-118

← 1 2 3 →