MSSTNet: Multi-scale facial videos pulse extraction network based on separable spatiotemporal convolution and dimension separable attention

被引:2
作者
Zhao C. [1 ,2 ]
Wang H. [1 ]
Feng Y. [1 ]
机构
[1] College of Information Engineering, Zhejiang University of Technology, Hangzhou
[2] Hangzhou Innovation Institute, Beihang University, Hangzhou
来源
Virtual Reality and Intelligent Hardware | 2023年 / 5卷 / 02期
基金
中国国家自然科学基金;
关键词
dimension separable attention; heart rate; multi-scale; neural network; remote photoplethysmography; separable spatiotemporal convolution;
D O I
10.1016/j.vrih.2022.07.001
中图分类号
学科分类号
摘要
Background: Using remote photoplethysmography (rPPG) to estimate blood volume pulse in a non-contact way is an active research topic in recent years. Existing methods are mainly based on the single-scale region of interest (ROI). However, some noise signals that are not easily separated in single-scale space can be easily separated in multi-scale space. In addition, existing spatiotemporal networks mainly focus on local spatiotemporal information and lack emphasis on temporal information which is crucial in pulse extraction problems, resulting in insufficient spatiotemporal feature modeling. Methods: This paper proposes a multi-scale facial video pulse extraction network based on separable spatiotemporal convolution and dimension separable attention. First, in order to solve the problem of single-scale ROI, we construct a multi-scale feature space for initial signal separation. Secondly, separable spatiotemporal convolution and dimension separable attention are designed for efficient spatiotemporal correlation modeling, which increases the information interaction between long-span time and space dimensions and puts more emphasis on temporal features. Results: The signal-to-noise ratio (SNR) of the proposed network reaches 9.58 dB on the PURE dataset and 6.77 dB on the UBFC-rPPG dataset, which outperforms state-of-the-art algorithms. Conclusions: Results show that fusing multi-scale signals generally obtains better results than methods based on the only single-scale signal. The proposed separable spatiotemporal convolution and dimension separable attention mechanism contributes to more accurate pulse signal extraction. © 2022 Beijing Zhongke Journal Publishing Co. Ltd
引用
收藏
页码:124 / 141
页数:17
相关论文
共 51 条
[1]  
Benezeth Y., Li P., Macwan R., Nakamura K., Gomez R., Yang F., (2018)
[2]  
Zhang Q., Wu Q., Zhou Y., Wu X., Ou Y., Zhou H., Webcam-based, non-contact, real-time measurement for the physiological parameters of drivers, Measurement, 100, pp. 311-321, (2017)
[3]  
Liu S., Lan X., Yuen P.C., (2018)
[4]  
Fernandes S., Raj S., Ortiz E., Vintila I., Et al., (2020)
[5]  
Hao L., Hu H., Nas-hr: Neural architecture search for heart rate estimation from face videos, Virtual Reality and Intelligent Hardware, 3, 1, pp. 33-42, (2021)
[6]  
Ming-Zher P., Daniel J.M., Rosalind W.P., Non-contact, automated cardiac pulse measurements using video imaging and blind source separation, Optics Express, 18, 10, pp. 10762-10774, (2010)
[7]  
Lewandowska M., Rumiski J., Kocejko T., Nowak J., pp. 405-410
[8]  
Tsouri G.R., Kyal S., Dianat S., Mestha L.K., Constrained independent component analysis approach to nonobtrusive pulse rate measurements, Journal of Biomedical Optics, 17, 7, (2012)
[9]  
Haan G., de, Jeanne V., Robust pulse rate from chrominance-based rPPG, IEEE Transactions on Biomedical Engineering, 60, 10, pp. 2878-2886, (2013)
[10]  
Wang W., Brinker A.C., Stuijk S., Haan G., de. Algorithmic principles of remote PPG, IEEE Transactions on Biomedical Engineering, 64, 7, pp. 1479-1491, (2017)