Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation

Abstract Effective and efficient semantic segmentation of 3D point cloud data is important for many tasks. Many methods for point cloud semantic segmentation rely on computationally expensive sampling and grouping layers to process irregular points, while others convert irregular points into regular...

Full description

Bibliographic Details
Main Authors: Zheng Fang, Binyu Xiong, Fei Liu
Format: Article
Language:English
Published: Wiley 2022-10-01
Series:IET Computer Vision
Subjects:
Online Access:https://doi.org/10.1049/cvi2.12131
_version_ 1828099836243607552
author Zheng Fang
Binyu Xiong
Fei Liu
author_facet Zheng Fang
Binyu Xiong
Fei Liu
author_sort Zheng Fang
collection DOAJ
description Abstract Effective and efficient semantic segmentation of 3D point cloud data is important for many tasks. Many methods for point cloud semantic segmentation rely on computationally expensive sampling and grouping layers to process irregular points, while others convert irregular points into regular volumetric grids and process them with a 3D U‐Net‐based semantic segmentation network. However, most of these methods suffer from high computational costs and cannot be applied to the real‐time processing of large‐scale point clouds. To address these issues, we propose a computationally efficient point‐voxel‐based network architecture named Sparse Point‐Voxel Aggregation Network (SPVAN) for point cloud semantic segmentation. It consists of an encoding layer that consists of sparse convolution and MLP layers and a new decoding layer called Point Feature Aggregation Layer (PFAL) that is only composed of feature interpolation and MLP layers. Compared with recent popular point‐voxel‐based methods with the U‐Net‐based network, our method does not need 3D convolution networks in the decoding layer and thus achieves a higher speed. Experimental results on the large‐scale SemanticKITTI dataset show that our method gets a good balance between the efficiency and the performance. Moreover, our method achieves on‐par or better performance than previous methods for semantic segmentation on the challenging S3DIS dataset.
first_indexed 2024-04-11T08:19:51Z
format Article
id doaj.art-0fc06b821b53410796dd727fbc8192f2
institution Directory Open Access Journal
issn 1751-9632
1751-9640
language English
last_indexed 2024-04-11T08:19:51Z
publishDate 2022-10-01
publisher Wiley
record_format Article
series IET Computer Vision
spelling doaj.art-0fc06b821b53410796dd727fbc8192f22022-12-22T04:34:59ZengWileyIET Computer Vision1751-96321751-96402022-10-0116764465410.1049/cvi2.12131Sparse point‐voxel aggregation network for efficient point cloud semantic segmentationZheng Fang0Binyu Xiong1Fei Liu2Faculty of Robot Science and Engineering Northeastern University Shenyang Liaoning ChinaFaculty of Robot Science and Engineering Northeastern University Shenyang Liaoning ChinaFaculty of Robot Science and Engineering Northeastern University Shenyang Liaoning ChinaAbstract Effective and efficient semantic segmentation of 3D point cloud data is important for many tasks. Many methods for point cloud semantic segmentation rely on computationally expensive sampling and grouping layers to process irregular points, while others convert irregular points into regular volumetric grids and process them with a 3D U‐Net‐based semantic segmentation network. However, most of these methods suffer from high computational costs and cannot be applied to the real‐time processing of large‐scale point clouds. To address these issues, we propose a computationally efficient point‐voxel‐based network architecture named Sparse Point‐Voxel Aggregation Network (SPVAN) for point cloud semantic segmentation. It consists of an encoding layer that consists of sparse convolution and MLP layers and a new decoding layer called Point Feature Aggregation Layer (PFAL) that is only composed of feature interpolation and MLP layers. Compared with recent popular point‐voxel‐based methods with the U‐Net‐based network, our method does not need 3D convolution networks in the decoding layer and thus achieves a higher speed. Experimental results on the large‐scale SemanticKITTI dataset show that our method gets a good balance between the efficiency and the performance. Moreover, our method achieves on‐par or better performance than previous methods for semantic segmentation on the challenging S3DIS dataset.https://doi.org/10.1049/cvi2.12131image segmentationinterpolationmultilayer perceptronsconvolutional neural nets
spellingShingle Zheng Fang
Binyu Xiong
Fei Liu
Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation
IET Computer Vision
image segmentation
interpolation
multilayer perceptrons
convolutional neural nets
title Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation
title_full Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation
title_fullStr Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation
title_full_unstemmed Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation
title_short Sparse point‐voxel aggregation network for efficient point cloud semantic segmentation
title_sort sparse point voxel aggregation network for efficient point cloud semantic segmentation
topic image segmentation
interpolation
multilayer perceptrons
convolutional neural nets
url https://doi.org/10.1049/cvi2.12131
work_keys_str_mv AT zhengfang sparsepointvoxelaggregationnetworkforefficientpointcloudsemanticsegmentation
AT binyuxiong sparsepointvoxelaggregationnetworkforefficientpointcloudsemanticsegmentation
AT feiliu sparsepointvoxelaggregationnetworkforefficientpointcloudsemanticsegmentation