Vision Transformer-based pilot pose estimation

被引:0
作者
Wu, Honglan [1 ]
Liu, Hao [1 ]
Sun, Youchao [1 ]
机构
[1] College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing
来源
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics | 2024年 / 50卷 / 10期
基金
中国国家自然科学基金;
关键词
civil aircraft; convolutional neural network; explainability; intelligent cockpit; pilot pose estimation; self-attention;
D O I
10.13700/j.bh.1001-5965.2022.0811
中图分类号
学科分类号
摘要
Human pose estimation is an important aspect in the field of behavioral perception and a key technology in the way of intelligent interaction in the cockpit of civil aircraft. To establish an explainable link between the complex lighting environment in the cockpit of civil aircraft and the performance of the pilot pose estimation model, the visual Transformer-based pilot pose (ViTPPose) estimation model is proposed. In order to capture the global correlation of subsequent higher-order features while expanding the perceptual field, this model employs a two-branch Transformer module with several coding layers at the end of the convolutional neural networks (CNN)backbone network. The coding layers combine the Transformer and the dilated convolution. Based on the flight crew’s standard operating procedures, a pilot maneuvering behavior keypoint detection dataset is established for flight simulation scenarios. ViTPPose estimation model completes the pilot seating estimation on this dataset and verifies its validity by comparing it with the benchmark model. The seating estimation heatmap is created in the context of the cockpit’s complicated lighting to examine the model’s preferred lighting intensity, evaluate the ViTPPose estimation model’s performance under various lighting conditions, and highlight the model’s reliance on various lighting intensities. © 2024 Beijing University of Aeronautics and Astronautics (BUAA). All rights reserved.
引用
收藏
页码:3100 / 3110
页数:10
相关论文
共 40 条
[1]  
13th Five-Year” national strategic emerging industry development plan No 67, (2016)
[2]  
Outline of the People’s Republic of China 14th five-year plan for national economic and social development and long-range objectives for 2035, (2021)
[3]  
YANG Z G, ZHANG J, LI B, Et al., Reviews on intelligent flight technology of civil aircraft, Acta Aeronautica et Astronautica Sinica, 42, 4, (2021)
[4]  
TOSHEV A, SZEGEDY C., DeepPose: Human pose estimation via deep neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653-1660, (2014)
[5]  
CHEN Y P, DAI X Y, LIU M C, Et al., Dynamic ReLU, Proceedings of the European Conference on Computer Vision, pp. 351-367, (2020)
[6]  
HE K M, ZHANG X Y, REN S Q, Et al., Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, (2016)
[7]  
HE K M, GKIOXARI G, DOLLAR P, Et al., Mask R-CNN, IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2, pp. 386-397, (2020)
[8]  
SELVARAJU R R, COGSWELL M, DAS A, Et al., Grad-CAM: Visual explanations from deep networks via gradient-based localization, Proceedings of the IEEE International Conference on Computer Vision, pp. 618-626, (2017)
[9]  
ZHANG Q L, YANG Y B., Group-CAM: Group score-weighted visual explanations for deep convolutional networks
[10]  
LIU Z, MAO H Z, WU C Y, Et al., A ConvNet for the 2020s, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11966-11976, (2022)