A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data

Due to the real-time acquisition and reasonable cost of consumer cameras, monocular depth maps have been employed in a variety of visual applications. Regarding ongoing research in depth estimation, they continue to suffer from low accuracy and enormous sensor noise. To improve the prediction of dep...

Full description

Bibliographic Details
Main Authors: Faisal Khan, Waseem Shariff, Muhammad Ali Farooq, Shubhajit Basak, Peter Corcoran
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10103585/
_version_ 1797834801620713472
author Faisal Khan
Waseem Shariff
Muhammad Ali Farooq
Shubhajit Basak
Peter Corcoran
author_facet Faisal Khan
Waseem Shariff
Muhammad Ali Farooq
Shubhajit Basak
Peter Corcoran
author_sort Faisal Khan
collection DOAJ
description Due to the real-time acquisition and reasonable cost of consumer cameras, monocular depth maps have been employed in a variety of visual applications. Regarding ongoing research in depth estimation, they continue to suffer from low accuracy and enormous sensor noise. To improve the prediction of depth maps, this paper proposed a lightweight neural facial depth estimation model based on single image frames. Following a basic encoder-decoder network design, the features are extracted by initializing the encoder with a high-performance pre-trained network and reconstructing high-quality facial depth maps with a simple decoder. The model can employ pixel representations and recover full details in terms of facial features and boundaries by employing a feature fusion module. When tested and evaluated across four public facial depth datasets, the suggested network provides more reliable and state-of-the-art results, with significantly less computational complexity and a reduced number of parameters. The training procedure is primarily based on the use of synthetic human facial images, which provide a consistent ground truth depth map, and the employment of an appropriate loss function leads to higher performance. Numerous experiments have been performed to validate and demonstrate the usefulness of the proposed approach. Finally, the model performs better than existing comparative facial depth networks in terms of generalization ability and robustness across different test datasets, setting a new baseline method for facial depth maps.
first_indexed 2024-04-09T14:43:16Z
format Article
id doaj.art-42d2f3e1111f4ece8944de0a05e66cc3
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-09T14:43:16Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-42d2f3e1111f4ece8944de0a05e66cc32023-05-02T23:00:44ZengIEEEIEEE Access2169-35362023-01-0111414804149110.1109/ACCESS.2023.326797010103585A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic DataFaisal Khan0https://orcid.org/0000-0002-8391-6203Waseem Shariff1https://orcid.org/0000-0001-7298-9389Muhammad Ali Farooq2https://orcid.org/0000-0003-4116-8021Shubhajit Basak3https://orcid.org/0000-0002-4078-874XPeter Corcoran4https://orcid.org/0000-0003-1670-4793School of Engineering, National University of Ireland Galway (NUIG), Galway, IrelandSchool of Engineering, National University of Ireland Galway (NUIG), Galway, IrelandSchool of Engineering, National University of Ireland Galway (NUIG), Galway, IrelandSchool of Computer Science, National University of Ireland Galway (NUIG), Galway, IrelandSchool of Engineering, National University of Ireland Galway (NUIG), Galway, IrelandDue to the real-time acquisition and reasonable cost of consumer cameras, monocular depth maps have been employed in a variety of visual applications. Regarding ongoing research in depth estimation, they continue to suffer from low accuracy and enormous sensor noise. To improve the prediction of depth maps, this paper proposed a lightweight neural facial depth estimation model based on single image frames. Following a basic encoder-decoder network design, the features are extracted by initializing the encoder with a high-performance pre-trained network and reconstructing high-quality facial depth maps with a simple decoder. The model can employ pixel representations and recover full details in terms of facial features and boundaries by employing a feature fusion module. When tested and evaluated across four public facial depth datasets, the suggested network provides more reliable and state-of-the-art results, with significantly less computational complexity and a reduced number of parameters. The training procedure is primarily based on the use of synthetic human facial images, which provide a consistent ground truth depth map, and the employment of an appropriate loss function leads to higher performance. Numerous experiments have been performed to validate and demonstrate the usefulness of the proposed approach. Finally, the model performs better than existing comparative facial depth networks in terms of generalization ability and robustness across different test datasets, setting a new baseline method for facial depth maps.https://ieeexplore.ieee.org/document/10103585/Facial depth estimationfeature fusionencoder-decoder architecturedeep learning
spellingShingle Faisal Khan
Waseem Shariff
Muhammad Ali Farooq
Shubhajit Basak
Peter Corcoran
A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data
IEEE Access
Facial depth estimation
feature fusion
encoder-decoder architecture
deep learning
title A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data
title_full A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data
title_fullStr A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data
title_full_unstemmed A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data
title_short A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data
title_sort robust light weight fused feature encoder decoder model for monocular facial depth estimation from single images trained on synthetic data
topic Facial depth estimation
feature fusion
encoder-decoder architecture
deep learning
url https://ieeexplore.ieee.org/document/10103585/
work_keys_str_mv AT faisalkhan arobustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata
AT waseemshariff arobustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata
AT muhammadalifarooq arobustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata
AT shubhajitbasak arobustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata
AT petercorcoran arobustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata
AT faisalkhan robustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata
AT waseemshariff robustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata
AT muhammadalifarooq robustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata
AT shubhajitbasak robustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata
AT petercorcoran robustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata