Action recognition based on multimode fusion for VR online platform

被引:9
作者
Li, Xuan [1 ]
Chen, Hengxin [1 ]
He, Shengdong [1 ]
Chen, Xinrun [1 ]
Dong, Shuang [1 ]
Yan, Ping [1 ]
Fang, Bin [1 ]
机构
[1] Chongqing Univ, Coll Comp Sci, Chongqing 400044, Peoples R China
基金
中国国家自然科学基金;
关键词
Data augmentation; Action recognition; Virtual reality online platform; Remote education; VIRTUAL-REALITY;
D O I
10.1007/s10055-023-00773-4
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The current popular online communication platforms can convey information only in the form of text, voice, pictures, and other electronic means. The richness and reliability of information is not comparable to traditional face-to-face communication. The use of virtual reality (VR) technology for online communication is a viable alternative to face-to-face communication. In the current VR online communication platform, users are in a virtual world in the form of avatars, which can achieve "face-to-face" communication to a certain extent. However, the actions of the avatar do not follow the user, which makes the communication process less realistic. Decision-makers need to make decisions based on the behavior of VR users, but there are no effective methods for action data collection in VR environments. In our work, three modalities of nine actions from VR users are collected using a virtual reality head-mounted display (VR HMD) built-in sensors, RGB cameras and human pose estimation. Using these data and advanced multimodal fusion action recognition networks, we obtained a high accuracy action recognition model. In addition, we take advantage of the VR HMD to collect 3D position data and design a 2D key point augmentation scheme for VR users. Using the augmented 2D key point data and VR HMD sensor data, we can train action recognition models with high accuracy and strong stability. In data collection and experimental work, we focus our research on classroom scenes, and the results can be extended to other scenes.
引用
收藏
页码:1797 / 1812
页数:16
相关论文
共 54 条
[1]  
Abdel-Salam R, 2021, ARXIV
[2]   Human Action Recognition Using Deep Multilevel Multimodal (M2) Fusion of Depth and Inertial Sensors [J].
Ahmad, Zeeshan ;
Khan, Naimul .
IEEE SENSORS JOURNAL, 2020, 20 (03) :1445-1455
[3]   School discipline, school uniforms and academic performance [J].
Baumann, Chris ;
Krskova, Hana .
INTERNATIONAL JOURNAL OF EDUCATIONAL MANAGEMENT, 2016, 30 (06) :1003-1029
[4]   Teachers' classroom-based action research [J].
Cain, Tim .
INTERNATIONAL JOURNAL OF RESEARCH & METHOD IN EDUCATION, 2011, 34 (01) :3-16
[5]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[6]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[7]  
Chen C, 2015, IEEE IMAGE PROC, P168, DOI 10.1109/ICIP.2015.7350781
[8]   A virtual reality experiment system for an introductory computer hardware course [J].
Chen, Xinrun ;
Chen, Hengxin ;
Guo, Songtao ;
Li, Jie ;
Zhang, Jie ;
Li, Zihao .
COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, 2021, 29 (06) :1702-1717
[9]  
Contributors M., 2020, OpenMMLabs Next Generation Video Understanding Toolbox and Benchmark
[10]   AR3D: Attention Residual 3D Network for Human Action Recognition [J].
Dong, Min ;
Fang, Zhenglin ;
Li, Yongfa ;
Bi, Sheng ;
Chen, Jiangcheng .
SENSORS, 2021, 21 (05) :1-15