Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation

Abstract Effective and efficient semantic segmentation of 3D point cloud data is important for many tasks. Many methods for point cloud semantic segmentation rely on computationally expensive sampling and grouping layers to process irregular points, while others convert irregular points into regular...

Full description

Bibliographic Details
Main Authors:	Zheng Fang, Binyu Xiong, Fei Liu
Format:	Article
Language:	English
Published:	Wiley 2022-10-01
Series:	IET Computer Vision
Subjects:	image segmentation interpolation multilayer perceptrons convolutional neural nets
Online Access:	https://doi.org/10.1049/cvi2.12131

_version_	1828099836243607552
author	Zheng Fang Binyu Xiong Fei Liu
author_facet	Zheng Fang Binyu Xiong Fei Liu
author_sort	Zheng Fang
collection	DOAJ
description	Abstract Effective and efficient semantic segmentation of 3D point cloud data is important for many tasks. Many methods for point cloud semantic segmentation rely on computationally expensive sampling and grouping layers to process irregular points, while others convert irregular points into regular volumetric grids and process them with a 3D U‐Net‐based semantic segmentation network. However, most of these methods suffer from high computational costs and cannot be applied to the real‐time processing of large‐scale point clouds. To address these issues, we propose a computationally efficient point‐voxel‐based network architecture named Sparse Point‐Voxel Aggregation Network (SPVAN) for point cloud semantic segmentation. It consists of an encoding layer that consists of sparse convolution and MLP layers and a new decoding layer called Point Feature Aggregation Layer (PFAL) that is only composed of feature interpolation and MLP layers. Compared with recent popular point‐voxel‐based methods with the U‐Net‐based network, our method does not need 3D convolution networks in the decoding layer and thus achieves a higher speed. Experimental results on the large‐scale SemanticKITTI dataset show that our method gets a good balance between the efficiency and the performance. Moreover, our method achieves on‐par or better performance than previous methods for semantic segmentation on the challenging S3DIS dataset.
first_indexed	2024-04-11T08:19:51Z
format	Article
id	doaj.art-0fc06b821b53410796dd727fbc8192f2
institution	Directory Open Access Journal
issn	1751-9632 1751-9640
language	English
last_indexed	2024-04-11T08:19:51Z
publishDate	2022-10-01
publisher	Wiley
record_format	Article
series	IET Computer Vision
spelling	doaj.art-0fc06b821b53410796dd727fbc8192f22022-12-22T04:34:59ZengWileyIET Computer Vision1751-96321751-96402022-10-0116764465410.1049/cvi2.12131Sparse point‐voxel aggregation network for efficient point cloud semantic segmentationZheng Fang0Binyu Xiong1Fei Liu2Faculty of Robot Science and Engineering Northeastern University Shenyang Liaoning ChinaFaculty of Robot Science and Engineering Northeastern University Shenyang Liaoning ChinaFaculty of Robot Science and Engineering Northeastern University Shenyang Liaoning ChinaAbstract Effective and efficient semantic segmentation of 3D point cloud data is important for many tasks. Many methods for point cloud semantic segmentation rely on computationally expensive sampling and grouping layers to process irregular points, while others convert irregular points into regular volumetric grids and process them with a 3D U‐Net‐based semantic segmentation network. However, most of these methods suffer from high computational costs and cannot be applied to the real‐time processing of large‐scale point clouds. To address these issues, we propose a computationally efficient point‐voxel‐based network architecture named Sparse Point‐Voxel Aggregation Network (SPVAN) for point cloud semantic segmentation. It consists of an encoding layer that consists of sparse convolution and MLP layers and a new decoding layer called Point Feature Aggregation Layer (PFAL) that is only composed of feature interpolation and MLP layers. Compared with recent popular point‐voxel‐based methods with the U‐Net‐based network, our method does not need 3D convolution networks in the decoding layer and thus achieves a higher speed. Experimental results on the large‐scale SemanticKITTI dataset show that our method gets a good balance between the efficiency and the performance. Moreover, our method achieves on‐par or better performance than previous methods for semantic segmentation on the challenging S3DIS dataset.https://doi.org/10.1049/cvi2.12131image segmentationinterpolationmultilayer perceptronsconvolutional neural nets
spellingShingle	Zheng Fang Binyu Xiong Fei Liu Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation IET Computer Vision image segmentation interpolation multilayer perceptrons convolutional neural nets
title	Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation
title_full	Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation
title_fullStr	Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation
title_full_unstemmed	Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation
title_short	Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation
title_sort	sparse point voxel aggregation network for efficient point cloud semantic segmentation
topic	image segmentation interpolation multilayer perceptrons convolutional neural nets
url	https://doi.org/10.1049/cvi2.12131
work_keys_str_mv	AT zhengfang sparsepointvoxelaggregationnetworkforefficientpointcloudsemanticsegmentation AT binyuxiong sparsepointvoxelaggregationnetworkforefficientpointcloudsemanticsegmentation AT feiliu sparsepointvoxelaggregationnetworkforefficientpointcloudsemanticsegmentation

Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation

Similar Items