Fine-grained activity classification in assembly based on multi-visual modalities

被引：20

作者：

Chen, Haodong ^{[1
]}

Zendehdel, Niloofar ^{[1
]}

Leu, Ming C. ^{[1
]}

Yin, Zhaozheng ^{[2
,3
]}

机构：

[1] Missouri Univ Sci & Technol, Dept Mech & Aerosp Engn, Rolla, MO 65409 USA

[2] SUNY Stony Brook, Dept Biomed Informat, Stony Brook, NY USA

[3] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY USA

来源：

JOURNAL OF INTELLIGENT MANUFACTURING | 2024年 / 35卷 / 05期

基金：

美国国家科学基金会;

关键词：

Fine-grained activity; Activity classification; Assembly; Multi-visual modality; RECOGNITION; LSTM;

D O I：

10.1007/s10845-023-02152-x

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Assembly activity recognition and prediction help to improve productivity, quality control, and safety measures in smart factories. This study aims to sense, recognize, and predict a worker's continuous fine-grained assembly activities in a manufacturing platform. We propose a two-stage network for workers' fine-grained activity classification by leveraging scene-level and temporal-level activity features. The first stage is a feature awareness block that extracts scene-level features from multi-visual modalities, including red-green-blue (RGB) and hand skeleton frames. We use the transfer learning method in the first stage and compare three different pre-trained feature extraction models. Then, we transmit the feature information from the first stage to the second stage to learn the temporal-level features of activities. The second stage consists of the Recurrent Neural Network (RNN) layers and a final classifier. We compare the performance of two different RNNs in the second stage, including the Long Short-Term Memory (LSTM) and the Gated Recurrent Unit (GRU). The partial video observation method is used in the prediction of fine-grained activities. In the experiments using the trimmed activity videos, our model achieves an accuracy of > 99% on our dataset and > 98% on the public dataset UCF 101, outperforming the state-of-the-art models. The prediction model achieves an accuracy of > 97% in predicting activity labels using 50% of the onset activity video information. In the experiments using an untrimmed video with continuous assembly activities, we combine our recognition and prediction models and achieve an accuracy of > 91% in real time, surpassing the state-of-the-art models for the recognition of continuous assembly activities.

引用

页码：2215 / 2233

页数：19

共 51 条

[1] STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition [J].

Ahn, Dasom ;

Kim, Sangwon ;

Hong, Hyunsu ;

Ko, Byoung Chul .

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :3319-3328

[2] Facial Emotion Recognition Using Transfer Learning in the Deep CNN [J].

Akhand, M. A. H. ;

Roy, Shuvendu ;

Siddique, Nazmul ;

Kamal, Md Abdus Samad ;

Shimamura, Tetsuya .

ELECTRONICS, 2021, 10 (09)

[3] An individualized system of skeletal data-based CNN classifiers for action recognition in manufacturing assembly [J].

Al-Amin, Md. ;

Qin, Ruwen ;

Moniruzzaman, Md ;

Yin, Zhaozheng ;

Tao, Wenjin ;

Leu, Ming C. .

JOURNAL OF INTELLIGENT MANUFACTURING, 2023, 34 (02) :633-649

[4] Fine-grained Activities of People Worldwide [J].

Byrne, Jeffrey ;

Castanon, Greg ;

Li, Zhongheng ;

Ettinger, Gil .

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :3307-3318

[5] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[6] State of the art: a review of sentiment analysis based on sequential transfer learning [J].

Chan, Jireh Yi-Le ;

Bea, Khean Thye ;

Leow, Steven Mun Hong ;

Phoong, Seuk Wai ;

Cheng, Wai Khuen .

ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (01) :749-780

[7] Real-Time Multi-Modal Human-Robot Collaboration Using Gestures and Speech [J].

Chen, Haodong ;

Leu, Ming C. ;

Yin, Zhaozheng .

JOURNAL OF MANUFACTURING SCIENCE AND ENGINEERING-TRANSACTIONS OF THE ASME, 2022, 144 (10)

[8]

Chen HD, 2020, PROCEEDINGS OF THE ASME 2020 INTERNATIONAL MECHANICAL ENGINEERING CONGRESS AND EXPOSITION, IMECE2020, VOL 2B

[9]

Chen Haodong, 2020, INT S FLEX AUT, V3617, DOI [10.1115/ISFA2020-9609, DOI 10.1115/ISFA2020-9609]

[10] Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges, and Opportunities [J].

Chen, Kaixuan ;

Zhang, Dalin ;

Yao, Lina ;

Guo, Bin ;

Yu, Zhiwen ;

Liu, Yunhao .

ACM COMPUTING SURVEYS, 2021, 54 (04)

← 1 2 3 4 5 6 →