Progressive Cross-modal Knowledge Distillation for Human Action Recognition

被引:9
作者
Ni, Jianyuan [1 ]
Ngu, Anne H. H. [1 ]
Yan, Yan [2 ]
机构
[1] Texas State Univ, San Marcos, TX USA
[2] IIT, Chicago, IL 60616 USA
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
关键词
Knowledge distillation; Progressive learning; Sensor-based human activity recognition; machine learning; ENSEMBLE;
D O I
10.1145/3503161.3548238
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Wearable sensor-based Human Action Recognition (HAR) has achieved remarkable success recently. However, the accuracy performance of wearable sensor-based HAR is still far behind the ones from the visual modalities-based system (i.e., RGB video, skeleton and depth). Diverse input modalities can provide complementary cues and thus improve the accuracy performance of HAR, but how to take advantage of multi-modal data on wearable sensor-based HAR has rarely been explored. Currently, wearable devices, i.e., smartwatches, can only capture limited kinds of non-visual modality data. This hinders the multi-modal HAR association as it is unable to simultaneously use both visual and non-visual modality data. Another major challenge lies in how to efficiently utilize multi-modal data on wearable devices with their limited computation resources. In this work, we propose a novel Progressive Skeleton-to-sensor Knowledge Distillation (PSKD) model which utilizes only time-series data, i.e., accelerometer data, from a smartwatch for solving the wearable sensor-based HAR problem. Specifically, we construct multiple teacher models using data from both teacher (human skeleton sequence) and student (time-series accelerometer data) modalities. In addition, we propose an effective progressive learning scheme to eliminate the performance gap between teacher and student models. We also designed a novel loss function called Adaptive-Confidence Semantic (ACS), to allow the student model to adaptively select either one of the teacher models or the ground-truth label it needs to mimic. To demonstrate the effectiveness of our proposed PSKD method, we conduct extensive experiments on Berkeley-MHAD, UTD-MHAD and MMAct datasets. The results confirm that the proposed PSKD method has competitive performance compared to the previous mono sensor-based HAR methods.
引用
收藏
页码:5903 / 5912
页数:10
相关论文
共 107 条
[1]   CNN-Based Multistage Gated Average Fusion (MGAF) for Human Action Recognition Using Depth and Inertial Sensors [J].
Ahmad, Zeeshan ;
Khan, Naimul .
IEEE SENSORS JOURNAL, 2021, 21 (03) :3623-3634
[2]  
Ahmad Z, 2019, 2019 IEEE FIFTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2019), P429, DOI [10.1109/BigMM.2019.00026, 10.1109/BigMM.2019.00074]
[3]  
[Anonymous], 2014, ISSNIP
[4]  
[Anonymous], 2021, JOINT EUR C MACH LEA, DOI DOI 10.1007/978-3-030-86523-836
[5]  
Ba LJ, 2014, ADV NEUR IN, V27
[6]  
Canto LF, 2013, SCATTERING THEORY OF MOLECULES, ATOMS AND NUCLEI, P3
[7]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[8]  
Chen C, 2015, IEEE IMAGE PROC, P168, DOI 10.1109/ICIP.2015.7350781
[9]   A Deep Learning Approach to Human Activity Recognition Based on Single Accelerometer [J].
Chen, Yuqing ;
Xue, Yang .
2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, :1488-1492
[10]   Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].
Cheng, Ke ;
Zhang, Yifan ;
He, Xiangyu ;
Chen, Weihan ;
Cheng, Jian ;
Lu, Hanqing .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189