Enhanced Industrial Action Recognition Through Self-Supervised Visual Transformers

被引:0
作者
Xiao, Yao [1 ]
Xiang, Hua [1 ]
Wang, Tongxi [1 ]
Wang, Yiju [2 ]
机构
[1] Yangtze Univ, Coll Comp Sci, Jingzhou 434025, Hubei, Peoples R China
[2] Guangzhou Xinhua Univ, Dept Artificial Intelligence & Data Sci, Guangzhou 510520, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Transformers; Training; Spatiotemporal phenomena; Production; Feature extraction; Computational modeling; Self-supervised learning; Data models; custom industrial dataset; action recognition; spatiotemporal features; visual transformer; pretraining strategy;
D O I
10.1109/ACCESS.2024.3455749
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Precise recognition of operator actions is crucial in industrial automation for enhancing production efficiency and ensuring safety standards. This study introduces a novel self-supervised pre-training framework using visual transformers to address the challenge of industrial event recognition. The framework incorporates an innovative Tube Masking strategy and leverages a comprehensive industrial dataset to effectively capture spatiotemporal features. Evaluation on our custom-built industrial dataset revealed a top-1 accuracy of 95%, demonstrating the model's practical applicability in real-world industrial environments. To further assess the model's generalization capabilities, it was tested on several public datasets, achieving top-1 accuracies of 92.8% on UCF101, 87.1% on HMDB51, and 90.2% on Kinetics400. These results highlight the robustness and versatility of our approach, paving the way for its application in diverse industrial scenarios and further research.
引用
收藏
页码:134133 / 134143
页数:11
相关论文
共 42 条
  • [1] Bao H., 2021, arXiv
  • [2] Bertasius G, 2021, PR MACH LEARN RES, V139
  • [3] Improving Spatiotemporal Self-supervision by Deep Reinforcement Learning
    Buechler, Uta
    Brattoli, Biagio
    Ommer, Bjoern
    [J]. COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 797 - 814
  • [4] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [5] Chen PH, 2021, AAAI CONF ARTIF INTE, V35, P1045
  • [6] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
  • [7] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [8] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [9] Multiscale Vision Transformers
    Fan, Haoqi
    Xiong, Bo
    Mangalam, Karttikeya
    Li, Yanghao
    Yan, Zhicheng
    Malik, Jitendra
    Feichtenhofer, Christoph
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6804 - 6815
  • [10] SlowFast Networks for Video Recognition
    Feichtenhofer, Christoph
    Fan, Haoqi
    Malik, Jitendra
    He, Kaiming
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210