Attention-Driven Cropping for Very High Resolution Facial Landmark Detection

被引：54

作者：

Chandran, Prashanth ^{[1
,2
]}

Bradley, Derek ^{[2
]}

Gross, Markus ^{[1
,2
]}

Beeler, Thabo ^{[2
]}

机构：

[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland

[2] DisneyRes Studios, Zurich, Switzerland

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2020年

关键词：

FACE ALIGNMENT; NETWORK; CASCADE;

D O I：

10.1109/CVPR42600.2020.00590

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Facial landmark detection is a fundamental task for many consumer and high-end applications and is almost entirely solved by machine learning methods today. Existing datasets used to train such algorithms are primarily made up of only low resolution images, and current algorithms are limited to inputs of comparable quality and resolution as the training dataset. On the other hand, high resolution imagery is becoming increasingly more common as consumer cameras improve in quality every year. Therefore, there is need for algorithms that can leverage the rich information available in high resolution imagery. Naively attempting to reuse existing network architectures on high resolution imagery is prohibitive due to memory bottlenecks on GPUs. The only current solution is to downsample the images, sacrificing resolution and quality. Building on top of recent progress in attention-based networks, we present a novel, Ally convolutional regional architecture that is specially designed for predicting landmarks on very high resolution facial images without downsampling. We demonstrate the flexibility of our architecture by training the proposed model with images of resolutions ranging from 256 x 256 to 4K. In addition to being the first method for facial landmark detection on high resolution images, our approach achieves superior performance over traditional (holistic) state-of-the-art architectures across ALL resolutions, leading to a general-purpose, extremely flexible, high quality landmark detector.

引用

页码：5860 / 5869

页数：10

共 55 条

[1]

[Anonymous], 2017, CORR

[2]

[Anonymous], 2018, P INT C LEARN REPR V

[3]

[Anonymous], 2017, ABS170404861 CORR

[4]

[Anonymous], 2016, IEEE T PATTERN ANAL

[5]

[Anonymous], 2016, CORR

[6] High-Quality Single-Shot Capture of Facial Geometry [J].

Beeler, Thabo ;

Bickel, Bernd ;

Beardsley, Paul ;

Sumner, Bob ;

Gross, Markus .

ACM TRANSACTIONS ON GRAPHICS, 2010, 29 (04)

[7] Faster Than Real-time Facial Alignment: A 3D Spatial Transformer Network Approach in Unconstrained Poses [J].

Bhagavatula, Chandrasekhar ;

Zhu, Chenchen ;

Luu, Khoa ;

Savvides, Marios .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4000-4009

[8] How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks) [J].

Bulat, Adrian ;

Tzimiropoulos, Georgios .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1021-1030

[9] Face Alignment by Explicit Shape Regression [J].

Cao, Xudong ;

Wei, Yichen ;

Wen, Fang ;

Sun, Jian .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 107 (02) :177-190

[10] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].

Cao, Zhe ;

Simon, Tomas ;

Wei, Shih-En ;

Sheikh, Yaser .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310

← 1 2 3 4 5 6 →