Semantic scene completion via semantic-aware guidance and interactive refinement transformer

Predicting per-voxel occupancy status and corresponding semantic labels in 3D scenes is pivotal to 3D intelligent perception in autonomous driving. In this paper, we propose a novel semantic scene completion framework that can generate complete 3D volumetric semantics from a single image at a low co...

Full description

Bibliographic Details
Main Authors:	Xiao, Haihong, Kang, Wenxiong, Liu, Hao, Li, Yuqiong, He, Ying
Other Authors:	College of Computing and Data Science
Format:	Journal Article
Language:	English
Published:	2025
Subjects:	Computer and Information Science Semantic scene completion Interactive refinement transformer
Online Access:	https://hdl.handle.net/10356/182767

_version_	1826114611183616000
author	Xiao, Haihong Kang, Wenxiong Liu, Hao Li, Yuqiong He, Ying
author2	College of Computing and Data Science
author_facet	College of Computing and Data Science Xiao, Haihong Kang, Wenxiong Liu, Hao Li, Yuqiong He, Ying
author_sort	Xiao, Haihong
collection	NTU
description	Predicting per-voxel occupancy status and corresponding semantic labels in 3D scenes is pivotal to 3D intelligent perception in autonomous driving. In this paper, we propose a novel semantic scene completion framework that can generate complete 3D volumetric semantics from a single image at a low cost. To the best of our knowledge, this is the first endeavor specifically aimed at mitigating the negative impacts of incorrect voxel query proposals caused by erroneous depth estimates and enhancing interactions for positive ones in camera-based semantic scene completion tasks. Specifically, we present a straightforward yet effective Semantic-aware Guided (SAG) module, which seamlessly integrates with task-related semantic priors to facilitate effective interactions between image features and voxel query proposals in a plug-and-play manner. Furthermore, we introduce a set of learnable object queries to better perceive objects within the scene. Building on this, we propose an Interactive Refinement Transformer (IRT) block, which iteratively updates voxel query proposals to enhance the perception of semantics and objects within the scene by leveraging the interaction between object queries and voxel queries through query-to-query cross-attention. Extensive experiments demonstrate that our method outperforms existing state-of-the-art approaches, achieving overall improvements of 0.30 and 2.74 in mIoU metric on the SemanticKITTI and SSCBench-KITTI-360 validation datasets, respectively, while also showing superior performance in the aspect of small object generation.
first_indexed	2025-03-09T11:10:03Z
format	Journal Article
id	ntu-10356/182767
institution	Nanyang Technological University
language	English
last_indexed	2025-03-09T11:10:03Z
publishDate	2025
record_format	dspace
spelling	ntu-10356/1827672025-02-24T07:51:07Z Semantic scene completion via semantic-aware guidance and interactive refinement transformer Xiao, Haihong Kang, Wenxiong Liu, Hao Li, Yuqiong He, Ying College of Computing and Data Science Computer and Information Science Semantic scene completion Interactive refinement transformer Predicting per-voxel occupancy status and corresponding semantic labels in 3D scenes is pivotal to 3D intelligent perception in autonomous driving. In this paper, we propose a novel semantic scene completion framework that can generate complete 3D volumetric semantics from a single image at a low cost. To the best of our knowledge, this is the first endeavor specifically aimed at mitigating the negative impacts of incorrect voxel query proposals caused by erroneous depth estimates and enhancing interactions for positive ones in camera-based semantic scene completion tasks. Specifically, we present a straightforward yet effective Semantic-aware Guided (SAG) module, which seamlessly integrates with task-related semantic priors to facilitate effective interactions between image features and voxel query proposals in a plug-and-play manner. Furthermore, we introduce a set of learnable object queries to better perceive objects within the scene. Building on this, we propose an Interactive Refinement Transformer (IRT) block, which iteratively updates voxel query proposals to enhance the perception of semantics and objects within the scene by leveraging the interaction between object queries and voxel queries through query-to-query cross-attention. Extensive experiments demonstrate that our method outperforms existing state-of-the-art approaches, achieving overall improvements of 0.30 and 2.74 in mIoU metric on the SemanticKITTI and SSCBench-KITTI-360 validation datasets, respectively, while also showing superior performance in the aspect of small object generation. This work was funded by the National Natural Science Foundation of China (No. 62376100), the Natural Science Foundation of Guangdong Province of China (No.2022A1515010114). 2025-02-24T07:51:07Z 2025-02-24T07:51:07Z 2024 Journal Article Xiao, H., Kang, W., Liu, H., Li, Y. & He, Y. (2024). Semantic scene completion via semantic-aware guidance and interactive refinement transformer. IEEE Transactions On Circuits and Systems for Video Technology, 3518493-. https://dx.doi.org/10.1109/TCSVT.2024.3518493 1051-8215 https://hdl.handle.net/10356/182767 10.1109/TCSVT.2024.3518493 2-s2.0-85212763209 3518493 en IEEE Transactions on Circuits and Systems for Video Technology © 2024 IEEE. All rights reserved.
spellingShingle	Computer and Information Science Semantic scene completion Interactive refinement transformer Xiao, Haihong Kang, Wenxiong Liu, Hao Li, Yuqiong He, Ying Semantic scene completion via semantic-aware guidance and interactive refinement transformer
title	Semantic scene completion via semantic-aware guidance and interactive refinement transformer
title_full	Semantic scene completion via semantic-aware guidance and interactive refinement transformer
title_fullStr	Semantic scene completion via semantic-aware guidance and interactive refinement transformer
title_full_unstemmed	Semantic scene completion via semantic-aware guidance and interactive refinement transformer
title_short	Semantic scene completion via semantic-aware guidance and interactive refinement transformer
title_sort	semantic scene completion via semantic aware guidance and interactive refinement transformer
topic	Computer and Information Science Semantic scene completion Interactive refinement transformer
url	https://hdl.handle.net/10356/182767
work_keys_str_mv	AT xiaohaihong semanticscenecompletionviasemanticawareguidanceandinteractiverefinementtransformer AT kangwenxiong semanticscenecompletionviasemanticawareguidanceandinteractiverefinementtransformer AT liuhao semanticscenecompletionviasemanticawareguidanceandinteractiverefinementtransformer AT liyuqiong semanticscenecompletionviasemanticawareguidanceandinteractiverefinementtransformer AT heying semanticscenecompletionviasemanticawareguidanceandinteractiverefinementtransformer

Semantic scene completion via semantic-aware guidance and interactive refinement transformer

Similar Items