Efficient face detection and tracking in video sequences based on deep learning

被引:23
作者
Zheng, Guangyong [1 ,2 ]
Xu, Yuming [3 ]
机构
[1] Hengyang Normal Univ, Coll Comp Sci & Technol, Hengyang 421002, Hunan, Peoples R China
[2] Hunan Prov Key Lab Intelligent Informat Proc & Ap, Hengyang 421002, Hunan, Peoples R China
[3] Changsha Normal Univ, Coll Informat Sci & Engn, Changsha 410100, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep learning; Face detection; Face tracking; Regression network; Correction network; MEAN SHIFT;
D O I
10.1016/j.ins.2021.03.027
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video-based face detection and tracking technology has been widely used in video surveillance, safe driving, and medical diagnosis. In video sequences, most existing face detection and tracking methods face interference caused by occlusion, ambient illumination, and changes in human posture. To accurately track human faces in video sequences, we propose an efficient face detection and tracking framework based on deep learning, which includes a SENResNet face detection model and a Regression Network-based Face Tracking (RNFT) model. Firstly, the SENResNet model integrates the Squeeze and Excitation Network (SEN) with the Residual Neural Network (ResNet). To solve the problem that deep neural networks are difficult to train, we use ResNet to overcome the problem of gradient disappearance in deep network training. To fuse the features of each channel during the convolution operation, we further integrate the SEN module into the SENResNet model. SENResNet accurately detects facial information in each frame and extracts the position of the target face, thereby providing an initialization window for face tracking. Then, the RNFT model extracts facial features from adjacent frames and predict the position of the target face in the next frame. To address the problem of feature scaling, we add a correction network to the RNFT model. The improved RNFT model extracts the rectangular frame of the target face in the previous frame and strengthens the perception of feature scaling, thereby improving its accuracy. Extensive experimental results on public facial and video datasets show that the proposed SENResNet and RNFT models are superior to the state-of-the-art comparison methods in terms of accuracy and performance. (c) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:265 / 285
页数:21
相关论文
共 43 条
[1]  
[Anonymous], 2004, VIP 05 P PAN SYDN AR
[2]  
[Anonymous], 1997, Proceedings of computer vision and pattern recognition, Puerto Rico, DOI [DOI 10.1109/CVPR.1997.609310, 10.1109/cvpr.1997.609310]
[3]   An Ensemble Model Using Face and Body Tracking for Engagement Detection [J].
Chang, Cheng ;
Zhang, Cheng ;
Chen, Lei ;
Liu, Yang .
ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, :616-622
[4]   A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks [J].
Chen, Jianguo ;
Li, Kenli ;
Bilal, Kashif ;
Zhou, Xu ;
Li, Keqin ;
Yu, Philip S. .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (05) :965-976
[5]   A disease diagnosis and treatment recommendation system based on big data mining and cloud computing [J].
Chen, Jianguo ;
Li, Kenli ;
Rong, Huigui ;
Bilal, Kashif ;
Yang, Nan ;
Li, Keqin .
INFORMATION SCIENCES, 2018, 435 :124-149
[6]  
Chen Jie., IEEE Trans. Knowl. Data Eng
[7]   Mean shift: A robust approach toward feature space analysis [J].
Comaniciu, D ;
Meer, P .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (05) :603-619
[8]   Facial landmark detection and tracking with dynamically adaptive matched filters [J].
Contreras-Gonzalez, Viridiana ;
Diaz-Ramirez, Victor H. ;
Juarez-Salazar, Rigoberto .
JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (03)
[9]  
Doulamis N, 2016, IEEE CONF IMAGING SY, P318, DOI 10.1109/IST.2016.7738244
[10]  
Gooogle, 2020, VISUAL TRACKER BENCH