Active Vision in Binocular Depth Estimation: A Top-Down Perspective
Depth estimation is an ill-posed problem; objects of different shapes or dimensions, even if at different distances, may project to the same image on the retina. Our brain uses several cues for depth estimation, including monocular cues such as motion parallax and binocular cues such as diplopia. Ho...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-09-01
|
Series: | Biomimetics |
Subjects: | |
Online Access: | https://www.mdpi.com/2313-7673/8/5/445 |
_version_ | 1797581107379568640 |
---|---|
author | Matteo Priorelli Giovanni Pezzulo Ivilin Peev Stoianov |
author_facet | Matteo Priorelli Giovanni Pezzulo Ivilin Peev Stoianov |
author_sort | Matteo Priorelli |
collection | DOAJ |
description | Depth estimation is an ill-posed problem; objects of different shapes or dimensions, even if at different distances, may project to the same image on the retina. Our brain uses several cues for depth estimation, including monocular cues such as motion parallax and binocular cues such as diplopia. However, it remains unclear how the computations required for depth estimation are implemented in biologically plausible ways. State-of-the-art approaches to depth estimation based on deep neural networks implicitly describe the brain as a hierarchical feature detector. Instead, in this paper we propose an alternative approach that casts depth estimation as a problem of active inference. We show that depth can be inferred by inverting a hierarchical generative model that simultaneously predicts the eyes’ projections from a 2D belief over an object. Model inversion consists of a series of biologically plausible homogeneous transformations based on Predictive Coding principles. Under the plausible assumption of a nonuniform fovea resolution, depth estimation favors an active vision strategy that fixates the object with the eyes, rendering the depth belief more accurate. This strategy is not realized by first fixating on a target and then estimating the depth; instead, it combines the two processes through action–perception cycles, with a similar mechanism of the saccades during object recognition. The proposed approach requires only local (top-down and bottom-up) message passing, which can be implemented in biologically plausible neural circuits. |
first_indexed | 2024-03-10T23:00:31Z |
format | Article |
id | doaj.art-9bdd95c1806f4bcc88032cfdc696392b |
institution | Directory Open Access Journal |
issn | 2313-7673 |
language | English |
last_indexed | 2024-03-10T23:00:31Z |
publishDate | 2023-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Biomimetics |
spelling | doaj.art-9bdd95c1806f4bcc88032cfdc696392b2023-11-19T09:44:36ZengMDPI AGBiomimetics2313-76732023-09-018544510.3390/biomimetics8050445Active Vision in Binocular Depth Estimation: A Top-Down PerspectiveMatteo Priorelli0Giovanni Pezzulo1Ivilin Peev Stoianov2Institute of Cognitive Sciences and Technologies, National Research Council of Italy, 35137 Padova, ItalyInstitute of Cognitive Sciences and Technologies, National Research Council of Italy, 00185 Rome, ItalyInstitute of Cognitive Sciences and Technologies, National Research Council of Italy, 35137 Padova, ItalyDepth estimation is an ill-posed problem; objects of different shapes or dimensions, even if at different distances, may project to the same image on the retina. Our brain uses several cues for depth estimation, including monocular cues such as motion parallax and binocular cues such as diplopia. However, it remains unclear how the computations required for depth estimation are implemented in biologically plausible ways. State-of-the-art approaches to depth estimation based on deep neural networks implicitly describe the brain as a hierarchical feature detector. Instead, in this paper we propose an alternative approach that casts depth estimation as a problem of active inference. We show that depth can be inferred by inverting a hierarchical generative model that simultaneously predicts the eyes’ projections from a 2D belief over an object. Model inversion consists of a series of biologically plausible homogeneous transformations based on Predictive Coding principles. Under the plausible assumption of a nonuniform fovea resolution, depth estimation favors an active vision strategy that fixates the object with the eyes, rendering the depth belief more accurate. This strategy is not realized by first fixating on a target and then estimating the depth; instead, it combines the two processes through action–perception cycles, with a similar mechanism of the saccades during object recognition. The proposed approach requires only local (top-down and bottom-up) message passing, which can be implemented in biologically plausible neural circuits.https://www.mdpi.com/2313-7673/8/5/445active inferencedepth perceptionactive visionpredictive codingaction-perception cycles |
spellingShingle | Matteo Priorelli Giovanni Pezzulo Ivilin Peev Stoianov Active Vision in Binocular Depth Estimation: A Top-Down Perspective Biomimetics active inference depth perception active vision predictive coding action-perception cycles |
title | Active Vision in Binocular Depth Estimation: A Top-Down Perspective |
title_full | Active Vision in Binocular Depth Estimation: A Top-Down Perspective |
title_fullStr | Active Vision in Binocular Depth Estimation: A Top-Down Perspective |
title_full_unstemmed | Active Vision in Binocular Depth Estimation: A Top-Down Perspective |
title_short | Active Vision in Binocular Depth Estimation: A Top-Down Perspective |
title_sort | active vision in binocular depth estimation a top down perspective |
topic | active inference depth perception active vision predictive coding action-perception cycles |
url | https://www.mdpi.com/2313-7673/8/5/445 |
work_keys_str_mv | AT matteopriorelli activevisioninbinoculardepthestimationatopdownperspective AT giovannipezzulo activevisioninbinoculardepthestimationatopdownperspective AT ivilinpeevstoianov activevisioninbinoculardepthestimationatopdownperspective |