Vision-Based Multi-Modal Framework for Action Recognition

被引：7

作者：

Romaissa, Beddiar Djamila ^{[1
,2
]}

Mourad, Oussalah ^{[2
]}

Brahim, Nini ^{[1
]}

机构：

[1] Univ Laarbi Ben Mhidi, Res Lab Comp Sci Complex Syst, Oum El Bouaghi, Algeria

[2] Univ Oulu, Ctr Machine Vis & Signal Anal, Oulu, Finland

来源：

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2021年

关键词：

FUSION;

D O I：

10.1109/ICPR48806.2021.9412863

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human activity recognition plays a central role in the development of intelligent systems for video surveillance, public security, health care and home monitoring, where detection and recognition of activities can improve the quality of life and security of humans. Typically, automated, intuitive and real-time systems are required to recognize human activities and identify accurately unusual behaviors in order to prevent dangerous situations. In this work, we explore the combination of three modalities (RGB, depth and skeleton data) to design a robust multi-modal framework for vision-based human activity recognition. Especially, spatial information, body shape/posture and temporal evolution of actions are highlighted using illustrative representations obtained from a combination of dynamic RGB images, dynamic depth images and skeleton data representations. Therefore, each video is represented with three images that summarize the ongoing action. Our framework takes advantage of transfer learning from pre-trained models to extract significant features from these newly created images. Next, we fuse extracted features using Canonical Correlation Analysis and train a Long Short-Term Memory network to classify actions from visual descriptive images. Experimental results demonstrated the reliability of our feature-fusion framework that allows us to capture highly significant features and enables us to achieve the state-of-the-art performance on the public UTD-MHAD and NTU RGB+D datasets.

引用

页码：5859 / 5866

页数：8

共 50 条

[1] SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings
Nhat Truong Pham
Duc Ngoc Minh Dang
Bich Ngoc Hong Pham
Sy Dzung Nguyen
PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2023, 2023, : 234 - 238
[2] Sports action recognition algorithm based on multi-modal data recognition
Zhang, Lin
Intelligent Decision Technologies, 2024, 18 (04) : 3243 - 3257
[3] Learning Probabilistic Multi-Modal Actor Models for Vision-Based Robotic Grasping
Yan, Mengyuan
Li, Adrian
Kalakrishnan, Mrinal
Pastor, Peter
2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 4804 - 4810
[4] Small-shot Multi-modal Distillation for Vision-based Autonomous Steering
Shen, Yu
Yang, Luyu
Wang, Xijun
Lin, Ming C.
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 7763 - 7770
[5] Multi-Modal Multi-Action Video Recognition
Shi, Zhensheng
Liang, Ju
Li, Qianqian
Zheng, Haiyong
Gu, Zhaorui
Dong, Junyu
Zheng, Bing
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13658 - 13667
[6] Modality Mixer for Multi-modal Action Recognition
Lee, Sumin
Woo, Sangmin
Park, Yeonju
Nugroho, Muhammad Adi
Kim, Changick
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3297 - 3306
[7] Multi-modal machine vision-based gap detection algorithm for composite surface stitching
Wang, Xin
Liu, Fengning
Li, Shudi
Zhao, Xinyu
Liu, Jianshun
Zhang, Yinlong
INTERNATIONAL JOURNAL OF MODELLING IDENTIFICATION AND CONTROL, 2023, 42 (02) : 180 - 189
[8] Multi-modal fusion method for human action recognition based on IALC
Zhang, Yinhuan
Xiao, Qinkun
Liu, Xing
Wei, Yongquan
Chu, Chaoqin
Xue, Jingyun
IET IMAGE PROCESSING, 2023, 17 (02) : 388 - 400
[9] Robotics multi-modal recognition system via computer-based vision
Shahin, Mohammad
Chen, F. Frank
Hosseinzadeh, Ali
Bouzary, Hamed
Shahin, Awni
International Journal of Advanced Manufacturing Technology, 2024,
[10] Robotics multi-modal recognition system via computer-based vision
Shahin, Mohammad
Chen, F. Frank
Hosseinzadeh, Ali
Bouzary, Hamed
Shahin, Awni
INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2025, 136 (09): : 3989 - 4005

← 1 2 3 4 5 →