Progressive Cross-modal Knowledge Distillation for Human Action Recognition

被引：9

作者：

Ni, Jianyuan ^{[1
]}

Ngu, Anne H. H. ^{[1
]}

Yan, Yan ^{[2
]}

机构：

[1] Texas State Univ, San Marcos, TX USA

[2] IIT, Chicago, IL 60616 USA

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

关键词：

Knowledge distillation; Progressive learning; Sensor-based human activity recognition; machine learning; ENSEMBLE;

D O I：

10.1145/3503161.3548238

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Wearable sensor-based Human Action Recognition (HAR) has achieved remarkable success recently. However, the accuracy performance of wearable sensor-based HAR is still far behind the ones from the visual modalities-based system (i.e., RGB video, skeleton and depth). Diverse input modalities can provide complementary cues and thus improve the accuracy performance of HAR, but how to take advantage of multi-modal data on wearable sensor-based HAR has rarely been explored. Currently, wearable devices, i.e., smartwatches, can only capture limited kinds of non-visual modality data. This hinders the multi-modal HAR association as it is unable to simultaneously use both visual and non-visual modality data. Another major challenge lies in how to efficiently utilize multi-modal data on wearable devices with their limited computation resources. In this work, we propose a novel Progressive Skeleton-to-sensor Knowledge Distillation (PSKD) model which utilizes only time-series data, i.e., accelerometer data, from a smartwatch for solving the wearable sensor-based HAR problem. Specifically, we construct multiple teacher models using data from both teacher (human skeleton sequence) and student (time-series accelerometer data) modalities. In addition, we propose an effective progressive learning scheme to eliminate the performance gap between teacher and student models. We also designed a novel loss function called Adaptive-Confidence Semantic (ACS), to allow the student model to adaptively select either one of the teacher models or the ground-truth label it needs to mimic. To demonstrate the effectiveness of our proposed PSKD method, we conduct extensive experiments on Berkeley-MHAD, UTD-MHAD and MMAct datasets. The results confirm that the proposed PSKD method has competitive performance compared to the previous mono sensor-based HAR methods.

引用

页码：5903 / 5912

页数：10

共 107 条

[1] CNN-Based Multistage Gated Average Fusion (MGAF) for Human Action Recognition Using Depth and Inertial Sensors [J].

Ahmad, Zeeshan ;

Khan, Naimul .

IEEE SENSORS JOURNAL, 2021, 21 (03) :3623-3634

[2]

Ahmad Z, 2019, 2019 IEEE FIFTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2019), P429, DOI [10.1109/BigMM.2019.00026, 10.1109/BigMM.2019.00074]

[3]

[Anonymous], 2014, ISSNIP

[4]

[Anonymous], 2021, JOINT EUR C MACH LEA, DOI DOI 10.1007/978-3-030-86523-836

[5]

Ba LJ, 2014, ADV NEUR IN, V27

[6]

Canto LF, 2013, SCATTERING THEORY OF MOLECULES, ATOMS AND NUCLEI, P3

[7] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].

Cao, Zhe ;

Simon, Tomas ;

Wei, Shih-En ;

Sheikh, Yaser .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310

[8]

Chen C, 2015, IEEE IMAGE PROC, P168, DOI 10.1109/ICIP.2015.7350781

[9] A Deep Learning Approach to Human Activity Recognition Based on Single Accelerometer [J].

Chen, Yuqing ;

Xue, Yang .

2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, :1488-1492

[10] Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].

Cheng, Ke ;

Zhang, Yifan ;

He, Xiangyu ;

Chen, Weihan ;

Cheng, Jian ;

Lu, Hanqing .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189

← 1 2 3 4 5 6 7 8 9 10 →