Fine-grained activity classification in assembly based on multi-visual modalities

被引：22

作者：

Chen, Haodong ^{[1
]}

Zendehdel, Niloofar ^{[1
]}

Leu, Ming C. ^{[1
]}

Yin, Zhaozheng ^{[2
,3
]}

机构：

[1] Missouri Univ Sci & Technol, Dept Mech & Aerosp Engn, Rolla, MO 65409 USA

[2] SUNY Stony Brook, Dept Biomed Informat, Stony Brook, NY USA

[3] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY USA

来源：

JOURNAL OF INTELLIGENT MANUFACTURING | 2024年 / 35卷 / 05期

基金：

美国国家科学基金会;

关键词：

Fine-grained activity; Activity classification; Assembly; Multi-visual modality; RECOGNITION; LSTM;

D O I：

10.1007/s10845-023-02152-x

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Assembly activity recognition and prediction help to improve productivity, quality control, and safety measures in smart factories. This study aims to sense, recognize, and predict a worker's continuous fine-grained assembly activities in a manufacturing platform. We propose a two-stage network for workers' fine-grained activity classification by leveraging scene-level and temporal-level activity features. The first stage is a feature awareness block that extracts scene-level features from multi-visual modalities, including red-green-blue (RGB) and hand skeleton frames. We use the transfer learning method in the first stage and compare three different pre-trained feature extraction models. Then, we transmit the feature information from the first stage to the second stage to learn the temporal-level features of activities. The second stage consists of the Recurrent Neural Network (RNN) layers and a final classifier. We compare the performance of two different RNNs in the second stage, including the Long Short-Term Memory (LSTM) and the Gated Recurrent Unit (GRU). The partial video observation method is used in the prediction of fine-grained activities. In the experiments using the trimmed activity videos, our model achieves an accuracy of > 99% on our dataset and > 98% on the public dataset UCF 101, outperforming the state-of-the-art models. The prediction model achieves an accuracy of > 97% in predicting activity labels using 50% of the onset activity video information. In the experiments using an untrimmed video with continuous assembly activities, we combine our recognition and prediction models and achieve an accuracy of > 91% in real time, surpassing the state-of-the-art models for the recognition of continuous assembly activities.

引用

页码：2215 / 2233

页数：19

共 51 条

[41] Real-Time Assembly Operation Recognition with Fog Computing and Transfer Learning for Human-Centered Intelligent Manufacturing [J].

Tao, Wenjin ;

Al-Amin, Md ;

Chen, Haodong ;

Leu, Ming C. ;

Yin, Zhaozheng ;

Qin, Ruwen .

48TH SME NORTH AMERICAN MANUFACTURING RESEARCH CONFERENCE, NAMRC 48, 2020, 48 :926-931

[42] Image denoising using deep CNN with batch renormalization [J].

Tian, Chunwei ;

Xu, Yong ;

Zuo, Wangmeng .

NEURAL NETWORKS, 2020, 121 :461-473

[43] Predictive Modeling of Short-Term Rockburst for the Stability of Subsurface Structures Using Machine Learning Approaches: t-SNE, K-Means Clustering and XGBoost [J].

Ullah, Barkat ;

Kamran, Muhammad ;

Rui, Yichao .

MATHEMATICS, 2022, 10 (03)

[44]

Xia Lu., 2012, 2012 IEEE Computer Society Conference, Providence, RI, P20

[45]

Xiao Junfei, 2022, IEEE C COMPUTER VISI, P3242, DOI [DOI 10.1109/CVPR52688.2022.00325, 10.1109/CVPR52688.2022.00325]

[46]

Yao B., 2011, Artificial Intelligence Review, V1, pD3

[47] A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures [J].

Yu, Yong ;

Si, Xiaosheng ;

Hu, Changhua ;

Zhang, Jianxun .

NEURAL COMPUTATION, 2019, 31 (07) :1235-1270

[48]

Zhang C., 2020, ARXIV

[49]

Zhang F, 2020, ArXiv Prepr ArXiv, V2006, DOI [DOI 10.48550/ARXIV.2006.10214, 10.48550/arXiv.2006.10214]

[50] The applications of Industry 4.0 technologies in manufacturing context: a systematic literature review [J].

Zheng, Ting ;

Ardolino, Marco ;

Bacchetti, Andrea ;

Perona, Marco .

INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2021, 59 (06) :1922-1954

← 1 2 3 4 5 6 →