Learning spatial-temporal deformable networks for unconstrained face alignment and tracking in videos

被引：12

作者：

Zhu, Hongyu ^{[1
]}

Liu, Hao ^{[1
,2
]}

Zhu, Congcong ^{[1
,3
]}

Deng, Zongyong ^{[1
]}

Sun, Xuehong ^{[1
,2
]}

机构：

[1] Ningxia Univ, Sch Informat Engn, Yinchuan 750021, Ningxia, Peoples R China

[2] Collaborat Innovat Ctr Ningxia Big Data & Artific, Yinchuan 750021, Ningxia, Peoples R China

[3] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China

来源：

PATTERN RECOGNITION | 2020年 / 107卷

基金：

美国国家科学基金会;

关键词：

Face alignment; Face tracking; Spatial transformer; Relational reasoning; Video analysis; Biometrics; IMAGE;

D O I：

10.1016/j.patcog.2020.107354

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a spatial-temporal deformable networks approach to investigate both problems of face alignment in static images and face tracking in videos under unconstrained environments. Unlike conventional feature extractions which cannot explicitly exploit augmented spatial geometry for various facial shapes, in our approach, we propose a deformable hourglass networks (DHGN) method, which aims to learn a deformable mask to reduce the variances of facial deformation and extract attentional facial regions for robust feature representation. However, our DHGN is limited to extract only spatial appearance features from static facial images, which cannot explicitly exploit the temporal consistency information across consecutive frames in videos. For efficient temporal modeling, we further extend our DHGN to a temporal DHGN (T-DHGN) paradigm particularly for video-based face alignment. To this end, our T-DHGN principally incorporates with a temporal relational reasoning module, so that the temporal order relationship among frames is encoded in the relational feature. By doing this, our T-DHGN reasons about the temporal offsets to select a subset of discriminative frames over time steps, thus allowing temporal consistency information memorized to flow across frames for stable landmark tracking in videos. Compared with most state-of-the-art methods, our approach achieves superior performance on folds of widely-evaluated benchmarking datasets. Code will be made publicly available upon publication. (C) 2020 Elsevier Ltd. All rights reserved.

引用

页数：12

共 91 条

[1]

Abadi Martin, 2016, arXiv

[2]

[Anonymous], 1999, P 2 INT C AUD VID BA

[3]

[Anonymous], LECT NOTES COMPUT SC, DOI DOI 10.1007/978-3-319-16498-4-20

[4]

[Anonymous], 2016, ECCVW, DOI DOI 10.1007/978-3-319-48881-3_42

[5] Robust Discriminative Response Map Fitting with Constrained Local Models [J].

Asthana, Akshay ;

Zafeiriou, Stefanos ;

Cheng, Shiyang ;

Pantic, Maja .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :3444-3451

[6]

Belhumeur PN, 2011, PROC CVPR IEEE, P545, DOI 10.1109/CVPR.2011.5995602

[7]

BLACK MJ, 1995, FIFTH INTERNATIONAL CONFERENCE ON COMPUTER VISION, PROCEEDINGS, P374, DOI 10.1109/ICCV.1995.466915

[8] A Review of Facial Landmark Extraction in 2D Images and Videos Using Deep Learning [J].

Bodini, Matteo .

BIG DATA AND COGNITIVE COMPUTING, 2019, 3 (01) :1-14

[9] On the Applications of Robust PCA in Image and Video Processing [J].

Bouwmans, Thierry ;

Javed, Sajid ;

Zhang, Hongyang ;

Lin, Zhouchen ;

Otazo, Ricardo .

PROCEEDINGS OF THE IEEE, 2018, 106 (08) :1427-1457

[10] Super-FAN: Integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with GANs [J].

Bulat, Adrian ;

Tzimiropoulos, Georgios .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :109-117

← 1 2 3 4 5 6 7 8 9 10 →