A skeleton-based assembly action recognition method with feature fusion for human-robot collaborative assembly

被引:1
作者
Liu, Daxin [1 ,2 ]
Huang, Yu [1 ,2 ]
Liu, Zhenyu [1 ,2 ]
Mao, Haoyang [1 ,2 ]
Kan, Pengcheng [1 ,2 ]
Tan, Jianrong [1 ,2 ]
机构
[1] Zhejiang Univ, State Key Lab CAD & CG, Hangzhou 310058, Peoples R China
[2] Zhejiang Univ, Key Lab Intelligent Rescue Equipment Collapse Acci, Minist Emergency Management, Hangzhou 310058, Peoples R China
基金
中国国家自然科学基金;
关键词
Human-robot collaborative assembly; Assembly action recognition; Multi-scale and multi-stream mechanism; Feature fusion mechanism; Transitional action classification; PREDICTION; SUPPORT; SYSTEM;
D O I
10.1016/j.jmsy.2024.08.019
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Human-robot collaborative assembly (HRCA) is one of the current trends of intelligent manufacturing, and assembly action recognition is the basis of and the key to HRCA. A multi-scale and multi-stream graph convolutional network (2MSGCN) for assembly action recognition is proposed in this paper. 2MSGCN takes the temporal skeleton sample as input and outputs the class of the assembly action to which the sample belongs. RGBD images of the operator performing the assembly actions are captured by three RGBD cameras mounted at different viewpoints and pre-processed to generate the complete human skeleton. A multi-scale and multi-stream (2MS) mechanism and a feature fusion mechanism are proposed to improve the recognition accuracy of 2MSGCN. The 2MS mechanism is designed to input the skeleton data to 2MSGCN in the form of a joint stream, a bone stream and a motion stream, while the joint stream further generates two sets of input with rough scales to represent features in higher dimensional human skeleton, which obtains information of different scales and streams in temporal skeleton samples. And the feature fusion mechanism enables the fused feature to retain the information of the sub-feature while incorporating union information between the sub-features. Also, the improved convolution operation based on Ghost module is introduced to the 2MSGCN to reduce the number of the parameters and floating-point operations (FLOPs) and improve the real-time performance. Considering that there will be transitional actions when the operator switches between assembly actions in the continuous assembly process, a transitional action classification (TAC) method is proposed to distinguish the transitional actions from the assembly actions. Experiments on the public dataset NTU RGB+D +D 60 (NTU 60) and a self-built assembly action dataset indicate that the proposed 2MSGCN outperforms the mainstream models in recognition accuracy and real-time performance.
引用
收藏
页码:553 / 566
页数:14
相关论文
共 48 条
[1]   An individualized system of skeletal data-based CNN classifiers for action recognition in manufacturing assembly [J].
Al-Amin, Md. ;
Qin, Ruwen ;
Moniruzzaman, Md ;
Yin, Zhaozheng ;
Tao, Wenjin ;
Leu, Ming C. .
JOURNAL OF INTELLIGENT MANUFACTURING, 2023, 34 (02) :633-649
[2]   Fusing and refining convolutional neural network models for assembly action recognition in smart manufacturing [J].
Al-Amin, Md. ;
Qin, Ruwen ;
Tao, Wenjin ;
Doell, David ;
Lingard, Ravon ;
Yin, Zhaozheng ;
Leu, Ming C. .
PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART C-JOURNAL OF MECHANICAL ENGINEERING SCIENCE, 2022, 236 (04) :2046-2059
[3]   OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields [J].
Cao, Zhe ;
Hidalgo, Gines ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) :172-186
[4]   LSTM-based real-time action detection and prediction in human motion streams [J].
Carrara, Fabio ;
Elias, Petr ;
Sedmidubsky, Jan ;
Zezula, Pavel .
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (19) :27309-27331
[5]   Dynamic graph convolutional network for assembly behavior recognition based on attention mechanism and multi-scale feature fusion [J].
Chen, Chengjun ;
Zhao, Xicong ;
Wang, Jinlei ;
Li, Dongnian ;
Guan, Yuanlin ;
Hong, Jun .
SCIENTIFIC REPORTS, 2022, 12 (01)
[6]   Repetitive assembly action recognition based on object detection and pose estimation [J].
Chen, Chengjun ;
Wang, Tiannuo ;
Li, Dongnian ;
Hong, Jun .
JOURNAL OF MANUFACTURING SYSTEMS, 2020, 55 :325-333
[7]   Fine-grained activity classification in assembly based on multi-visual modalities [J].
Chen, Haodong ;
Zendehdel, Niloofar ;
Leu, Ming C. ;
Yin, Zhaozheng .
JOURNAL OF INTELLIGENT MANUFACTURING, 2024, 35 (05) :2215-2233
[8]   Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].
Cheng, Ke ;
Zhang, Yifan ;
He, Xiangyu ;
Chen, Weihan ;
Cheng, Jian ;
Lu, Hanqing .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189
[9]   Designs of human-robot interaction using depth sensor-based hand gesture communication for smart material-handling robot operations [J].
Ding, Ing-, Jr. ;
Su, Jun-Lin .
PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART B-JOURNAL OF ENGINEERING MANUFACTURE, 2023, 237 (03) :392-413
[10]   Sensor fusion based manipulative action recognition [J].
Gu, Ye ;
Liu, Meiqin ;
Sheng, Weihua ;
Ou, Yongsheng ;
Li, Yongqiang .
AUTONOMOUS ROBOTS, 2021, 45 (01) :1-13