N2F2: hierarchical scene understanding with nested neural feature fields

Understanding complex scenes at multiple levels of abstraction remains a formidable challenge in computer vision. To address this, we introduce Nested Neural Feature Fields (N2F2), a novel approach that employs hierarchical supervision to learn a single feature field, wherein different dimensions wi...

Full description

Bibliographic Details
Main Authors: Bhalgat, YS, Laina, I, Henriques, J, Zisserman, A, Vedaldi, A
Format: Conference item
Language:English
Published: Springer 2024
_version_ 1817932429226147840
author Bhalgat, YS
Laina, I
Henriques, J
Zisserman, A
Vedaldi, A
author_facet Bhalgat, YS
Laina, I
Henriques, J
Zisserman, A
Vedaldi, A
author_sort Bhalgat, YS
collection OXFORD
description Understanding complex scenes at multiple levels of abstraction remains a formidable challenge in computer vision. To address this, we introduce Nested Neural Feature Fields (N2F2), a novel approach that employs hierarchical supervision to learn a single feature field, wherein different dimensions within the same high-dimensional feature encode scene properties at varying granularities. Our method allows for a flexible definition of hierarchies, tailored to either the physical dimensions or semantics or both, thereby enabling a comprehensive and nuanced understanding of scenes. We leverage a 2D class-agnostic segmentation model to provide semantically meaningful pixel groupings at arbitrary scales in the image space, and query the CLIP vision-encoder to obtain language-aligned embeddings for each of these segments. Our proposed hierarchical supervision method then assigns different nested dimensions of the feature field to distill the CLIP embeddings using deferred volumetric rendering at varying physical scales, creating a coarse-to-fine representation. Extensive experiments show that our approach outperforms the state-of-the-art feature field distillation methods on tasks such as open-vocabulary 3D segmentation and localization, demonstrating the effectiveness of the learned nested feature field.
first_indexed 2024-09-25T04:24:09Z
format Conference item
id oxford-uuid:cedfd58a-d82e-4632-8d78-90811757ba74
institution University of Oxford
language English
last_indexed 2024-12-09T03:37:46Z
publishDate 2024
publisher Springer
record_format dspace
spelling oxford-uuid:cedfd58a-d82e-4632-8d78-90811757ba742024-12-04T10:34:55ZN2F2: hierarchical scene understanding with nested neural feature fieldsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:cedfd58a-d82e-4632-8d78-90811757ba74EnglishSymplectic ElementsSpringer2024Bhalgat, YSLaina, IHenriques, JZisserman, AVedaldi, AUnderstanding complex scenes at multiple levels of abstraction remains a formidable challenge in computer vision. To address this, we introduce Nested Neural Feature Fields (N2F2), a novel approach that employs hierarchical supervision to learn a single feature field, wherein different dimensions within the same high-dimensional feature encode scene properties at varying granularities. Our method allows for a flexible definition of hierarchies, tailored to either the physical dimensions or semantics or both, thereby enabling a comprehensive and nuanced understanding of scenes. We leverage a 2D class-agnostic segmentation model to provide semantically meaningful pixel groupings at arbitrary scales in the image space, and query the CLIP vision-encoder to obtain language-aligned embeddings for each of these segments. Our proposed hierarchical supervision method then assigns different nested dimensions of the feature field to distill the CLIP embeddings using deferred volumetric rendering at varying physical scales, creating a coarse-to-fine representation. Extensive experiments show that our approach outperforms the state-of-the-art feature field distillation methods on tasks such as open-vocabulary 3D segmentation and localization, demonstrating the effectiveness of the learned nested feature field.
spellingShingle Bhalgat, YS
Laina, I
Henriques, J
Zisserman, A
Vedaldi, A
N2F2: hierarchical scene understanding with nested neural feature fields
title N2F2: hierarchical scene understanding with nested neural feature fields
title_full N2F2: hierarchical scene understanding with nested neural feature fields
title_fullStr N2F2: hierarchical scene understanding with nested neural feature fields
title_full_unstemmed N2F2: hierarchical scene understanding with nested neural feature fields
title_short N2F2: hierarchical scene understanding with nested neural feature fields
title_sort n2f2 hierarchical scene understanding with nested neural feature fields
work_keys_str_mv AT bhalgatys n2f2hierarchicalsceneunderstandingwithnestedneuralfeaturefields
AT lainai n2f2hierarchicalsceneunderstandingwithnestedneuralfeaturefields
AT henriquesj n2f2hierarchicalsceneunderstandingwithnestedneuralfeaturefields
AT zissermana n2f2hierarchicalsceneunderstandingwithnestedneuralfeaturefields
AT vedaldia n2f2hierarchicalsceneunderstandingwithnestedneuralfeaturefields