Learning spatial-temporal deformable networks for unconstrained face alignment and tracking in videos

被引:12
作者
Zhu, Hongyu [1 ]
Liu, Hao [1 ,2 ]
Zhu, Congcong [1 ,3 ]
Deng, Zongyong [1 ]
Sun, Xuehong [1 ,2 ]
机构
[1] Ningxia Univ, Sch Informat Engn, Yinchuan 750021, Ningxia, Peoples R China
[2] Collaborat Innovat Ctr Ningxia Big Data & Artific, Yinchuan 750021, Ningxia, Peoples R China
[3] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
基金
美国国家科学基金会;
关键词
Face alignment; Face tracking; Spatial transformer; Relational reasoning; Video analysis; Biometrics; IMAGE;
D O I
10.1016/j.patcog.2020.107354
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a spatial-temporal deformable networks approach to investigate both problems of face alignment in static images and face tracking in videos under unconstrained environments. Unlike conventional feature extractions which cannot explicitly exploit augmented spatial geometry for various facial shapes, in our approach, we propose a deformable hourglass networks (DHGN) method, which aims to learn a deformable mask to reduce the variances of facial deformation and extract attentional facial regions for robust feature representation. However, our DHGN is limited to extract only spatial appearance features from static facial images, which cannot explicitly exploit the temporal consistency information across consecutive frames in videos. For efficient temporal modeling, we further extend our DHGN to a temporal DHGN (T-DHGN) paradigm particularly for video-based face alignment. To this end, our T-DHGN principally incorporates with a temporal relational reasoning module, so that the temporal order relationship among frames is encoded in the relational feature. By doing this, our T-DHGN reasons about the temporal offsets to select a subset of discriminative frames over time steps, thus allowing temporal consistency information memorized to flow across frames for stable landmark tracking in videos. Compared with most state-of-the-art methods, our approach achieves superior performance on folds of widely-evaluated benchmarking datasets. Code will be made publicly available upon publication. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 89 条
  • [1] Abadi Martin, 2016, arXiv
  • [2] [Anonymous], 2016, LECT NOTES COMPUT SC, DOI DOI 10.1007/978-3-319-46484-8_29
  • [3] [Anonymous], 2016, NAT METHODS, DOI DOI 10.1038/nmeth.3707
  • [4] [Anonymous], 2016, ECCVW, DOI DOI 10.1007/978-3-319-48881-3_42
  • [5] Robust Discriminative Response Map Fitting with Constrained Local Models
    Asthana, Akshay
    Zafeiriou, Stefanos
    Cheng, Shiyang
    Pantic, Maja
    [J]. 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 3444 - 3451
  • [6] Belhumeur PN, 2011, PROC CVPR IEEE, P545, DOI 10.1109/CVPR.2011.5995602
  • [7] BLACK MJ, 1995, FIFTH INTERNATIONAL CONFERENCE ON COMPUTER VISION, PROCEEDINGS, P374, DOI 10.1109/ICCV.1995.466915
  • [9] On the Applications of Robust PCA in Image and Video Processing
    Bouwmans, Thierry
    Javed, Sajid
    Zhang, Hongyang
    Lin, Zhouchen
    Otazo, Ricardo
    [J]. PROCEEDINGS OF THE IEEE, 2018, 106 (08) : 1427 - 1457
  • [10] Brabandere B. D., 2016, Advances in neural information processing systems, P667