Active Vision in Binocular Depth Estimation: A Top-Down Perspective

Depth estimation is an ill-posed problem; objects of different shapes or dimensions, even if at different distances, may project to the same image on the retina. Our brain uses several cues for depth estimation, including monocular cues such as motion parallax and binocular cues such as diplopia. Ho...

Full description

Bibliographic Details
Main Authors: Matteo Priorelli, Giovanni Pezzulo, Ivilin Peev Stoianov
Format: Article
Language:English
Published: MDPI AG 2023-09-01
Series:Biomimetics
Subjects:
Online Access:https://www.mdpi.com/2313-7673/8/5/445
_version_ 1797581107379568640
author Matteo Priorelli
Giovanni Pezzulo
Ivilin Peev Stoianov
author_facet Matteo Priorelli
Giovanni Pezzulo
Ivilin Peev Stoianov
author_sort Matteo Priorelli
collection DOAJ
description Depth estimation is an ill-posed problem; objects of different shapes or dimensions, even if at different distances, may project to the same image on the retina. Our brain uses several cues for depth estimation, including monocular cues such as motion parallax and binocular cues such as diplopia. However, it remains unclear how the computations required for depth estimation are implemented in biologically plausible ways. State-of-the-art approaches to depth estimation based on deep neural networks implicitly describe the brain as a hierarchical feature detector. Instead, in this paper we propose an alternative approach that casts depth estimation as a problem of active inference. We show that depth can be inferred by inverting a hierarchical generative model that simultaneously predicts the eyes’ projections from a 2D belief over an object. Model inversion consists of a series of biologically plausible homogeneous transformations based on Predictive Coding principles. Under the plausible assumption of a nonuniform fovea resolution, depth estimation favors an active vision strategy that fixates the object with the eyes, rendering the depth belief more accurate. This strategy is not realized by first fixating on a target and then estimating the depth; instead, it combines the two processes through action–perception cycles, with a similar mechanism of the saccades during object recognition. The proposed approach requires only local (top-down and bottom-up) message passing, which can be implemented in biologically plausible neural circuits.
first_indexed 2024-03-10T23:00:31Z
format Article
id doaj.art-9bdd95c1806f4bcc88032cfdc696392b
institution Directory Open Access Journal
issn 2313-7673
language English
last_indexed 2024-03-10T23:00:31Z
publishDate 2023-09-01
publisher MDPI AG
record_format Article
series Biomimetics
spelling doaj.art-9bdd95c1806f4bcc88032cfdc696392b2023-11-19T09:44:36ZengMDPI AGBiomimetics2313-76732023-09-018544510.3390/biomimetics8050445Active Vision in Binocular Depth Estimation: A Top-Down PerspectiveMatteo Priorelli0Giovanni Pezzulo1Ivilin Peev Stoianov2Institute of Cognitive Sciences and Technologies, National Research Council of Italy, 35137 Padova, ItalyInstitute of Cognitive Sciences and Technologies, National Research Council of Italy, 00185 Rome, ItalyInstitute of Cognitive Sciences and Technologies, National Research Council of Italy, 35137 Padova, ItalyDepth estimation is an ill-posed problem; objects of different shapes or dimensions, even if at different distances, may project to the same image on the retina. Our brain uses several cues for depth estimation, including monocular cues such as motion parallax and binocular cues such as diplopia. However, it remains unclear how the computations required for depth estimation are implemented in biologically plausible ways. State-of-the-art approaches to depth estimation based on deep neural networks implicitly describe the brain as a hierarchical feature detector. Instead, in this paper we propose an alternative approach that casts depth estimation as a problem of active inference. We show that depth can be inferred by inverting a hierarchical generative model that simultaneously predicts the eyes’ projections from a 2D belief over an object. Model inversion consists of a series of biologically plausible homogeneous transformations based on Predictive Coding principles. Under the plausible assumption of a nonuniform fovea resolution, depth estimation favors an active vision strategy that fixates the object with the eyes, rendering the depth belief more accurate. This strategy is not realized by first fixating on a target and then estimating the depth; instead, it combines the two processes through action–perception cycles, with a similar mechanism of the saccades during object recognition. The proposed approach requires only local (top-down and bottom-up) message passing, which can be implemented in biologically plausible neural circuits.https://www.mdpi.com/2313-7673/8/5/445active inferencedepth perceptionactive visionpredictive codingaction-perception cycles
spellingShingle Matteo Priorelli
Giovanni Pezzulo
Ivilin Peev Stoianov
Active Vision in Binocular Depth Estimation: A Top-Down Perspective
Biomimetics
active inference
depth perception
active vision
predictive coding
action-perception cycles
title Active Vision in Binocular Depth Estimation: A Top-Down Perspective
title_full Active Vision in Binocular Depth Estimation: A Top-Down Perspective
title_fullStr Active Vision in Binocular Depth Estimation: A Top-Down Perspective
title_full_unstemmed Active Vision in Binocular Depth Estimation: A Top-Down Perspective
title_short Active Vision in Binocular Depth Estimation: A Top-Down Perspective
title_sort active vision in binocular depth estimation a top down perspective
topic active inference
depth perception
active vision
predictive coding
action-perception cycles
url https://www.mdpi.com/2313-7673/8/5/445
work_keys_str_mv AT matteopriorelli activevisioninbinoculardepthestimationatopdownperspective
AT giovannipezzulo activevisioninbinoculardepthestimationatopdownperspective
AT ivilinpeevstoianov activevisioninbinoculardepthestimationatopdownperspective