A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data
Due to the real-time acquisition and reasonable cost of consumer cameras, monocular depth maps have been employed in a variety of visual applications. Regarding ongoing research in depth estimation, they continue to suffer from low accuracy and enormous sensor noise. To improve the prediction of dep...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10103585/ |
_version_ | 1797834801620713472 |
---|---|
author | Faisal Khan Waseem Shariff Muhammad Ali Farooq Shubhajit Basak Peter Corcoran |
author_facet | Faisal Khan Waseem Shariff Muhammad Ali Farooq Shubhajit Basak Peter Corcoran |
author_sort | Faisal Khan |
collection | DOAJ |
description | Due to the real-time acquisition and reasonable cost of consumer cameras, monocular depth maps have been employed in a variety of visual applications. Regarding ongoing research in depth estimation, they continue to suffer from low accuracy and enormous sensor noise. To improve the prediction of depth maps, this paper proposed a lightweight neural facial depth estimation model based on single image frames. Following a basic encoder-decoder network design, the features are extracted by initializing the encoder with a high-performance pre-trained network and reconstructing high-quality facial depth maps with a simple decoder. The model can employ pixel representations and recover full details in terms of facial features and boundaries by employing a feature fusion module. When tested and evaluated across four public facial depth datasets, the suggested network provides more reliable and state-of-the-art results, with significantly less computational complexity and a reduced number of parameters. The training procedure is primarily based on the use of synthetic human facial images, which provide a consistent ground truth depth map, and the employment of an appropriate loss function leads to higher performance. Numerous experiments have been performed to validate and demonstrate the usefulness of the proposed approach. Finally, the model performs better than existing comparative facial depth networks in terms of generalization ability and robustness across different test datasets, setting a new baseline method for facial depth maps. |
first_indexed | 2024-04-09T14:43:16Z |
format | Article |
id | doaj.art-42d2f3e1111f4ece8944de0a05e66cc3 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-09T14:43:16Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-42d2f3e1111f4ece8944de0a05e66cc32023-05-02T23:00:44ZengIEEEIEEE Access2169-35362023-01-0111414804149110.1109/ACCESS.2023.326797010103585A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic DataFaisal Khan0https://orcid.org/0000-0002-8391-6203Waseem Shariff1https://orcid.org/0000-0001-7298-9389Muhammad Ali Farooq2https://orcid.org/0000-0003-4116-8021Shubhajit Basak3https://orcid.org/0000-0002-4078-874XPeter Corcoran4https://orcid.org/0000-0003-1670-4793School of Engineering, National University of Ireland Galway (NUIG), Galway, IrelandSchool of Engineering, National University of Ireland Galway (NUIG), Galway, IrelandSchool of Engineering, National University of Ireland Galway (NUIG), Galway, IrelandSchool of Computer Science, National University of Ireland Galway (NUIG), Galway, IrelandSchool of Engineering, National University of Ireland Galway (NUIG), Galway, IrelandDue to the real-time acquisition and reasonable cost of consumer cameras, monocular depth maps have been employed in a variety of visual applications. Regarding ongoing research in depth estimation, they continue to suffer from low accuracy and enormous sensor noise. To improve the prediction of depth maps, this paper proposed a lightweight neural facial depth estimation model based on single image frames. Following a basic encoder-decoder network design, the features are extracted by initializing the encoder with a high-performance pre-trained network and reconstructing high-quality facial depth maps with a simple decoder. The model can employ pixel representations and recover full details in terms of facial features and boundaries by employing a feature fusion module. When tested and evaluated across four public facial depth datasets, the suggested network provides more reliable and state-of-the-art results, with significantly less computational complexity and a reduced number of parameters. The training procedure is primarily based on the use of synthetic human facial images, which provide a consistent ground truth depth map, and the employment of an appropriate loss function leads to higher performance. Numerous experiments have been performed to validate and demonstrate the usefulness of the proposed approach. Finally, the model performs better than existing comparative facial depth networks in terms of generalization ability and robustness across different test datasets, setting a new baseline method for facial depth maps.https://ieeexplore.ieee.org/document/10103585/Facial depth estimationfeature fusionencoder-decoder architecturedeep learning |
spellingShingle | Faisal Khan Waseem Shariff Muhammad Ali Farooq Shubhajit Basak Peter Corcoran A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data IEEE Access Facial depth estimation feature fusion encoder-decoder architecture deep learning |
title | A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data |
title_full | A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data |
title_fullStr | A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data |
title_full_unstemmed | A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data |
title_short | A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data |
title_sort | robust light weight fused feature encoder decoder model for monocular facial depth estimation from single images trained on synthetic data |
topic | Facial depth estimation feature fusion encoder-decoder architecture deep learning |
url | https://ieeexplore.ieee.org/document/10103585/ |
work_keys_str_mv | AT faisalkhan arobustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata AT waseemshariff arobustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata AT muhammadalifarooq arobustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata AT shubhajitbasak arobustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata AT petercorcoran arobustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata AT faisalkhan robustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata AT waseemshariff robustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata AT muhammadalifarooq robustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata AT shubhajitbasak robustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata AT petercorcoran robustlightweightfusedfeatureencoderdecodermodelformonocularfacialdepthestimationfromsingleimagestrainedonsyntheticdata |