Multimodal and Multi-granularity Graph Convolutional Networks for Elderly Daily Activity Recognition

被引:0
|
作者
Ding J. [1 ]
Shu X.-B. [1 ]
Huang P. [1 ]
Yao Y.-Z. [1 ]
Song Y. [1 ]
机构
[1] School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing
来源
Ruan Jian Xue Bao/Journal of Software | 2023年 / 34卷 / 05期
关键词
elderly activity recognition; graph convolutional network (GCN); multi-granularity; multimodal;
D O I
10.13328/j.cnki.jos.006439
中图分类号
学科分类号
摘要
With the problem of the aging population becomes serious, more attention is payed to the safety of the elderly when they are at home alone. In order to provide early warning, alarm, and report of some dangerous behaviors, several domestic and foreign research institutions are focusing on studying the intelligent monitoring of the daily activities of the elderly in robot-view. For promoting the industrialization of these technologies, this work mainly studies how to automatically recognize the daily activities of the elderly, such as “drinking water”, “washing hands”, “reading a book”, “reading a newspaper”. Through the investigation of the daily activity videos of the elderly, it is found that the semantics of the daily activities of the elderly are obviously fine-grained. For example, the semantics of “drinking water” and “taking medicine” are highly similar, and only a small number of video frames can accurately reflect their category semantics. To effectively address such problem of the elderly behavior recognition, this work proposes a new multimodal multi-granularity graph convolutional network (MM-GCN), by applying the graph convolution network on four modalities, i.e., the skeleton (“point”), bone (“line”), frame (“frame”), and proposal (“segment”), to model the activities of the elderly, and capture the semantics under the four granularities of “point-line-frame-proposal”. Finally, the experiments are conducted to validate the activity recognition performance of the proposed method on ETRI-Activity3D (110000+ videos, 50+ classes), which is the largest daily activities dataset for the elderly. Compared with the state-of-the-art methods, the proposed MM-GCN achieves the highest recognition accuracy. In addition, in order to verify the robustness of MM-GCN for the normal human action recognition tasks, the experiment is also carried out on the benchmark NTU RGB+D, and the results show that MM-GCN is comparable to the SOTA methods. © 2023 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:2350 / 2364
页数:14
相关论文
共 52 条
  • [11] Wang WG, Shen JB, Jia YD., Review of visual attention detection, Ruan Jian Xue Bao/Journal of Software, 30, 2, (2019)
  • [12] Dalal N, Triggs B., Histograms of oriented gradients for human detection, Proc. of the 2005 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 886-893, (2005)
  • [13] Lowe DG., Object recognition from local scale-invariant features, Proc. of the 7th IEEE Int’l Conf. on Computer Vision, pp. 1150-1157, (1999)
  • [14] Yan R, Xie LX, Tang JH, Shu XB, Tian Q., Social adaptive module for weakly-supervised group activity recognition, Proc. of the 16th European Conf. on Computer Vision, pp. 208-224, (2020)
  • [15] Yan SJ, Xiong YJ, Lin DH., Spatial temporal graph convolutional networks for skeleton-based action recognition, (2018)
  • [16] Li CL, Cui Z, Zheng WM, Xu CY, Yang J., Spatio-temporal graph convolution for skeleton based action recognition, (2018)
  • [17] Lin CH, Chou PY, Lin CH, Tsai MY., SlowFast-GCN: A novel skeleton-based action recognition framework, Proc. of the 2020 IEEE Int’l Conf. on Pervasive Artificial Intelligence, pp. 170-174, (2020)
  • [18] Feichtenhofer C, Fan HQ, Malik J, He KM., Slowfast networks for video recognition, Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision, pp. 6201-6210, (2019)
  • [19] Gao XS, Li KQ, Zhang Y, Miao QG, Sheng LJ., Xie J, Xu JF. 3D skeleton-based video action recognition by graph convolution network, Proc. of the 2019 IEEE Int’l Conf. on Smart Internet of Things, pp. 500-501, (2019)
  • [20] Li MS, Chen SH, Chen X, Zhang Y, Wang YF, Tian Q., Actional-structural graph convolutional networks for skeleton-based action recognition, Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 3590-3598, (2019)