A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data

被引：1

作者：

Khan, Faisal ^{[1
]}

Shariff, Waseem ^{[1
,3
]}

Farooq, Muhammad Ali ^{[1
]}

Basak, Shubhajit ^{[2
]}

Corcoran, Peter ^{[1
]}

机构：

[1] Natl Univ Ireland Galway NUIG, Sch Engn, Galway H91 TK33, Ireland

[2] Natl Univ Ireland Galway NUIG, Sch Comp Sci, Galway H91 TK33, Ireland

[3] Xperi Inc, Galway H91V0TX, Ireland

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Estimation; Facial recognition; Three-dimensional displays; Computational modeling; Encoding; Feature extraction; Cameras; Deep learning; Facial depth estimation; feature fusion; encoder-decoder architecture; deep learning; SYMMETRIC SHAPE;

D O I：

10.1109/ACCESS.2023.3267970

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Due to the real-time acquisition and reasonable cost of consumer cameras, monocular depth maps have been employed in a variety of visual applications. Regarding ongoing research in depth estimation, they continue to suffer from low accuracy and enormous sensor noise. To improve the prediction of depth maps, this paper proposed a lightweight neural facial depth estimation model based on single image frames. Following a basic encoder-decoder network design, the features are extracted by initializing the encoder with a high-performance pre-trained network and reconstructing high-quality facial depth maps with a simple decoder. The model can employ pixel representations and recover full details in terms of facial features and boundaries by employing a feature fusion module. When tested and evaluated across four public facial depth datasets, the suggested network provides more reliable and state-of-the-art results, with significantly less computational complexity and a reduced number of parameters. The training procedure is primarily based on the use of synthetic human facial images, which provide a consistent ground truth depth map, and the employment of an appropriate loss function leads to higher performance. Numerous experiments have been performed to validate and demonstrate the usefulness of the proposed approach. Finally, the model performs better than existing comparative facial depth networks in terms of generalization ability and robustness across different test datasets, setting a new baseline method for facial depth maps.

引用

页码：41480 / 41491

页数：12

共 2 条

[1] An efficient encoder-decoder model for portrait depth estimation from single images trained on pixel-accurate synthetic data
Khan, Faisal
Hussain, Shahid
Basak, Shubhajit
Lemley, Joseph
Corcoran, Peter
NEURAL NETWORKS, 2021, 142 : 479 - 491
[2] Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image
Tang, Mengxia
Chen, Songnan
Dong, Ruifang
Kan, Jiangming
IEEE ACCESS, 2021, 9 : 22640 - 22650

← 1 →