Vision-Based Multi-Modal Framework for Action Recognition

被引:7
|
作者
Romaissa, Beddiar Djamila [1 ,2 ]
Mourad, Oussalah [2 ]
Brahim, Nini [1 ]
机构
[1] Univ Laarbi Ben Mhidi, Res Lab Comp Sci Complex Syst, Oum El Bouaghi, Algeria
[2] Univ Oulu, Ctr Machine Vis & Signal Anal, Oulu, Finland
来源
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2021年
关键词
FUSION;
D O I
10.1109/ICPR48806.2021.9412863
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human activity recognition plays a central role in the development of intelligent systems for video surveillance, public security, health care and home monitoring, where detection and recognition of activities can improve the quality of life and security of humans. Typically, automated, intuitive and real-time systems are required to recognize human activities and identify accurately unusual behaviors in order to prevent dangerous situations. In this work, we explore the combination of three modalities (RGB, depth and skeleton data) to design a robust multi-modal framework for vision-based human activity recognition. Especially, spatial information, body shape/posture and temporal evolution of actions are highlighted using illustrative representations obtained from a combination of dynamic RGB images, dynamic depth images and skeleton data representations. Therefore, each video is represented with three images that summarize the ongoing action. Our framework takes advantage of transfer learning from pre-trained models to extract significant features from these newly created images. Next, we fuse extracted features using Canonical Correlation Analysis and train a Long Short-Term Memory network to classify actions from visual descriptive images. Experimental results demonstrated the reliability of our feature-fusion framework that allows us to capture highly significant features and enables us to achieve the state-of-the-art performance on the public UTD-MHAD and NTU RGB+D datasets.
引用
收藏
页码:5859 / 5866
页数:8
相关论文
共 50 条
  • [1] SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings
    Nhat Truong Pham
    Duc Ngoc Minh Dang
    Bich Ngoc Hong Pham
    Sy Dzung Nguyen
    PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2023, 2023, : 234 - 238
  • [2] Sports action recognition algorithm based on multi-modal data recognition
    Zhang, Lin
    Intelligent Decision Technologies, 2024, 18 (04) : 3243 - 3257
  • [3] Learning Probabilistic Multi-Modal Actor Models for Vision-Based Robotic Grasping
    Yan, Mengyuan
    Li, Adrian
    Kalakrishnan, Mrinal
    Pastor, Peter
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 4804 - 4810
  • [4] Small-shot Multi-modal Distillation for Vision-based Autonomous Steering
    Shen, Yu
    Yang, Luyu
    Wang, Xijun
    Lin, Ming C.
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 7763 - 7770
  • [5] Multi-Modal Multi-Action Video Recognition
    Shi, Zhensheng
    Liang, Ju
    Li, Qianqian
    Zheng, Haiyong
    Gu, Zhaorui
    Dong, Junyu
    Zheng, Bing
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13658 - 13667
  • [6] Modality Mixer for Multi-modal Action Recognition
    Lee, Sumin
    Woo, Sangmin
    Park, Yeonju
    Nugroho, Muhammad Adi
    Kim, Changick
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3297 - 3306
  • [7] Multi-modal machine vision-based gap detection algorithm for composite surface stitching
    Wang, Xin
    Liu, Fengning
    Li, Shudi
    Zhao, Xinyu
    Liu, Jianshun
    Zhang, Yinlong
    INTERNATIONAL JOURNAL OF MODELLING IDENTIFICATION AND CONTROL, 2023, 42 (02) : 180 - 189
  • [8] Multi-modal fusion method for human action recognition based on IALC
    Zhang, Yinhuan
    Xiao, Qinkun
    Liu, Xing
    Wei, Yongquan
    Chu, Chaoqin
    Xue, Jingyun
    IET IMAGE PROCESSING, 2023, 17 (02) : 388 - 400
  • [9] Robotics multi-modal recognition system via computer-based vision
    Shahin, Mohammad
    Chen, F. Frank
    Hosseinzadeh, Ali
    Bouzary, Hamed
    Shahin, Awni
    International Journal of Advanced Manufacturing Technology, 2024,
  • [10] Robotics multi-modal recognition system via computer-based vision
    Shahin, Mohammad
    Chen, F. Frank
    Hosseinzadeh, Ali
    Bouzary, Hamed
    Shahin, Awni
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2025, 136 (09): : 3989 - 4005