CWPR: An optimized transformer-based model for construction worker pose estimation on construction robots

被引:0
作者
Zhou, Jiakai [1 ]
Zhou, Wanlin [1 ]
Wang, Yang [2 ,3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Mech & Elect Engn, Nanjing 210000, Peoples R China
[2] Anhui Univ Technol, Sch Mech Engn, Maanshan 243000, Peoples R China
[3] Anhui Prov Key Lab Special Heavy Load Robot, Maanshan 243000, Peoples R China
关键词
Construction worker pose; Construction robots; Transformer; Multi-human pose estimation; SURVEILLANCE VIDEOS; RECOGNITION;
D O I
10.1016/j.aei.2024.102894
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Estimating construction workers' poses is critically important for recognizing unsafe behaviors, conducting ergonomic analyses, and assessing productivity. Recently, utilizing construction robots to capture RGB images for pose estimation offers flexible monitoring perspectives and timely interventions. However, existing multi- human pose estimation (MHPE) methods struggle to balance accuracy and speed, making them unsuitable for real-time applications on construction robots. This paper introduces the Construction Worker Pose Recognizer (CWPR), an optimized Transformer-based MHPE model tailored for construction robots. Specifically, CWPR utilizes a lightweight encoder equipped with a multi-scale feature fusion module to enhance operational speed. Then, an Intersection over Union (IoU)-aware query selection strategy is employed to provide high- quality initial queries for the hybrid decoder, significantly improving performance. Besides, a decoder denoising module is used to incorporate noisy ground truth into the decoder, mitigating sample imbalance and further improving accuracy. Additionally, the Construction Worker Pose and Action (CWPA) dataset is collected from 154 videos captured in real construction scenarios. The dataset is annotated for different tasks: a pose benchmark for MHPE and an action benchmark for action recognition. Experiments demonstrate that CWPR achieves top-level accuracy and the fastest inference speed, attaining 68.1 Average Precision (AP) with a processing time of 26 ms on the COCO test set and 76.2 AP with 21 ms on the CWPA pose benchmark. Moreover, when integrated with the action recognition method ST-GCN on construction robot hardware, CWPR achieves 78.7 AP and a processing time of 19 ms on the CWPA action benchmark, validating its effectiveness for practical deployment.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Transformer-based Sideslip Angle Estimation for Autonomous Vehicle Applications
    Meng, Dele
    Li, Zongxuan
    Chu, Hongqing
    Tian, Mengjian
    Kang, Qiao
    Gao, Bingzhao
    [J]. 2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 226 - 231
  • [22] Convolutional Transformer-Based Cross Subject Model for SSVEP-Based BCI Classification
    Liu, Jiawei
    Wang, Ruimin
    Yang, Yuankui
    Zong, Yuan
    Leng, Yue
    Zheng, Wenming
    Ge, Sheng
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (11) : 6581 - 6593
  • [23] A Transformer-based Embedding Model for Personalized Product Search
    Bi, Keping
    Ai, Qingyao
    Croft, W. Bruce
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1521 - 1524
  • [24] Transformer-based power system energy prediction model
    Rao, Zhuyi
    Zhang, Yunxiang
    [J]. PROCEEDINGS OF 2020 IEEE 5TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2020), 2020, : 913 - 917
  • [25] Generating Music Transition by Using a Transformer-Based Model
    Hsu, Jia-Lien
    Chang, Shuh-Jiun
    [J]. ELECTRONICS, 2021, 10 (18)
  • [26] A Transformer-Based Fusion Recommendation Model For IPTV Applications
    Li, Heng
    Lei, Hang
    Yang, Maolin
    Zeng, Jinghong
    Zhu, Di
    Fu, Shouwei
    [J]. 2020 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2020), 2020, : 177 - 182
  • [27] Training and analyzing a Transformer-based machine translation model
    Pimentel, Clovis Henrique Martins
    Pires, Thiago Blanch
    [J]. TEXTO LIVRE-LINGUAGEM E TECNOLOGIA, 2024, 17
  • [28] DeepReducer: A linear transformer-based model for MEG denoising
    Xu, Hui
    Zheng, Li
    Liao, Pan
    Lyu, Bingjiang
    Gao, Jia-Hong
    [J]. NEUROIMAGE, 2025, 308
  • [29] A Transformer-Based Signal Denoising Network for AoA Estimation in NLoS Environments
    Liu, Junchen
    Wang, Tianyu
    Li, Yuxiao
    Li, Cheng
    Wang, Yi
    Shen, Yuan
    [J]. IEEE COMMUNICATIONS LETTERS, 2022, 26 (10) : 2336 - 2339
  • [30] Transformer-based Siamese and Triplet Networks for Facial Expression Intensity Estimation
    Sabri, Motaz
    [J]. INTERNATIONAL JOURNAL OF AFFECTIVE ENGINEERING, 2022,