Learning spatial-temporal deformable networks for unconstrained face alignment and tracking in videos

被引：12

作者：

Zhu, Hongyu ^{[1
]}

Liu, Hao ^{[1
,2
]}

Zhu, Congcong ^{[1
,3
]}

Deng, Zongyong ^{[1
]}

Sun, Xuehong ^{[1
,2
]}

机构：

[1] Ningxia Univ, Sch Informat Engn, Yinchuan 750021, Ningxia, Peoples R China

[2] Collaborat Innovat Ctr Ningxia Big Data & Artific, Yinchuan 750021, Ningxia, Peoples R China

[3] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China

来源：

PATTERN RECOGNITION | 2020年 / 107卷

基金：

美国国家科学基金会;

关键词：

Face alignment; Face tracking; Spatial transformer; Relational reasoning; Video analysis; Biometrics; IMAGE;

D O I：

10.1016/j.patcog.2020.107354

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a spatial-temporal deformable networks approach to investigate both problems of face alignment in static images and face tracking in videos under unconstrained environments. Unlike conventional feature extractions which cannot explicitly exploit augmented spatial geometry for various facial shapes, in our approach, we propose a deformable hourglass networks (DHGN) method, which aims to learn a deformable mask to reduce the variances of facial deformation and extract attentional facial regions for robust feature representation. However, our DHGN is limited to extract only spatial appearance features from static facial images, which cannot explicitly exploit the temporal consistency information across consecutive frames in videos. For efficient temporal modeling, we further extend our DHGN to a temporal DHGN (T-DHGN) paradigm particularly for video-based face alignment. To this end, our T-DHGN principally incorporates with a temporal relational reasoning module, so that the temporal order relationship among frames is encoded in the relational feature. By doing this, our T-DHGN reasons about the temporal offsets to select a subset of discriminative frames over time steps, thus allowing temporal consistency information memorized to flow across frames for stable landmark tracking in videos. Compared with most state-of-the-art methods, our approach achieves superior performance on folds of widely-evaluated benchmarking datasets. Code will be made publicly available upon publication. (C) 2020 Elsevier Ltd. All rights reserved.

引用

页数：12

共 91 条

[41] Disentangling 3D Pose in A Dendritic CNN for Unconstrained 2D Face Alignment [J].

Kumar, Amit ;

Chellappa, Rama .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :430-439

[42]

Lazebnik S., COMPUTER VISION PATT, V2, P2169

[43]

Le V, 2012, LECT NOTES COMPUT SC, V7574, P679, DOI 10.1007/978-3-642-33712-3_49

[44]

LeCun Y., 2015, Commun. Surveys Tuts., V521, P436, DOI DOI 10.1038/nature14539

[45] Temporal Deformable Residual Networks for Action Segmentation in Videos [J].

Lei, Peng ;

Todorovic, Sinisa .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6742-6751

[46] Joint Face Alignment and 3D Face Reconstruction [J].

Liu, Feng ;

Zeng, Dan ;

Zhao, Qijun ;

Liu, Xiaoming .

COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :545-560

[47] Learning Reasoning-Decision Networks for Robust Face Alignment [J].

Liu, Hao ;

Lu, Jiwen ;

Guo, Minghao ;

Wu, Suping ;

Zhou, Jie .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (03) :679-693

[48] Two-Stream Transformer Networks for Video-Based Face Alignment [J].

Liu, Hao ;

Lu, Jiwen ;

Feng, Jianjiang ;

Zhou, Jie .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (11) :2546-2554

[49] Learning Deep Sharable and Structural Detectors for Face Alignment [J].

Liu, Hao ;

Lu, Jiwen ;

Feng, Jianjiang ;

Zhou, Jie .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (04) :1666-1678

[50]

Liu W., 2014, ARXIV14095230

← 1 2 3 4 5 6 7 8 9 10 →