Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引)

With the rapid increase of data size, the defects of traditional spatial indexing become more and more apparent. In comparison, learning indexing is based on data distribution. Its volume will not expand with the increase of the amount of data, and can achieve better performance without performing h...

Full description

Bibliographic Details
Main Authors: 傅晨华(FU Chenhua), 张丰(ZHANG Feng), 胡林舒(HU Linshu), 王立君(WANG Lijun)
Format: Article
Language:zho
Published: Zhejiang University Press 2024-03-01
Series:Zhejiang Daxue xuebao. Lixue ban
Subjects:
Online Access:https://doi.org/10.3785/j.issn.1008-9497.2024.02.003
_version_ 1797230889833332736
author 傅晨华(FU Chenhua)
张丰(ZHANG Feng)
胡林舒(HU Linshu)
王立君(WANG Lijun)
author_facet 傅晨华(FU Chenhua)
张丰(ZHANG Feng)
胡林舒(HU Linshu)
王立君(WANG Lijun)
author_sort 傅晨华(FU Chenhua)
collection DOAJ
description With the rapid increase of data size, the defects of traditional spatial indexing become more and more apparent. In comparison, learning indexing is based on data distribution. Its volume will not expand with the increase of the amount of data, and can achieve better performance without performing hierarchical comparison. Nevertheless, there are still two difficulties in applying the idea of learning indexing to spatial data: (1) How to choose appropriate dimension reduction method to sort the spatial data. (2) How to simplify data distribution of the dimension reduced data and make it easy to fit. This paper proposes a new type of grid mixed cluster partition learning indexing (grid-ml) based on the idea of learning indexing. In view of the above two difficulties, grid-ml uses z curve to reduce the dimension, and deals with the jumping problem with double-layer grid structure. Then, the improved K-means clustering method is used to simplify data distribution. The results show that grid-ml builds fast with small spatial storage volume, and can query fast as well, demonstrating significant advantages over the traditional spatial indexing approach.(传统空间索引的体量随数据量的增加而膨胀,查询效率较低。学习索引的体量不随数据量的增加而膨胀,同时避免了层级比较查询,性能优异。将学习索引应用于空间索引存在2个难点:一是选取合适的降维方法实现空间数据的排序;二是对降维后数据序列进行有效的简化分布计算,使其易于拟合。基于此,提出了一种网格混合聚类分区学习索引(grid-ml),用z曲线进行降维,用双层网格结构优化查询策略,用改进的K-means聚类算法进行数据分区,实现数据分布均匀化。对比实验发现,grid-ml构建速度快、存储空间小、查询效率高,较传统空间索引优势显著。)
first_indexed 2024-04-24T15:35:40Z
format Article
id doaj.art-cb31e4220ac0486b89186b769d672dd7
institution Directory Open Access Journal
issn 1008-9497
language zho
last_indexed 2024-04-24T15:35:40Z
publishDate 2024-03-01
publisher Zhejiang University Press
record_format Article
series Zhejiang Daxue xuebao. Lixue ban
spelling doaj.art-cb31e4220ac0486b89186b769d672dd72024-04-02T02:09:53ZzhoZhejiang University PressZhejiang Daxue xuebao. Lixue ban1008-94972024-03-0151215316110.3785/j.issn.1008-9497.2024.02.003Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引)傅晨华(FU Chenhua)0https://orcid.org/0009-0002-0683-624X张丰(ZHANG Feng)1https://orcid.org/0000-0003-1475-8480胡林舒(HU Linshu)2王立君(WANG Lijun)31Earth Science College, Zhejiang University, Hangzhou 310058, China(1浙江大学 地球科学学院,浙江 杭州 310058)1Earth Science College, Zhejiang University, Hangzhou 310058, China(1浙江大学 地球科学学院,浙江 杭州 310058)1Earth Science College, Zhejiang University, Hangzhou 310058, China(1浙江大学 地球科学学院,浙江 杭州 310058)1Earth Science College, Zhejiang University, Hangzhou 310058, China(1浙江大学 地球科学学院,浙江 杭州 310058)With the rapid increase of data size, the defects of traditional spatial indexing become more and more apparent. In comparison, learning indexing is based on data distribution. Its volume will not expand with the increase of the amount of data, and can achieve better performance without performing hierarchical comparison. Nevertheless, there are still two difficulties in applying the idea of learning indexing to spatial data: (1) How to choose appropriate dimension reduction method to sort the spatial data. (2) How to simplify data distribution of the dimension reduced data and make it easy to fit. This paper proposes a new type of grid mixed cluster partition learning indexing (grid-ml) based on the idea of learning indexing. In view of the above two difficulties, grid-ml uses z curve to reduce the dimension, and deals with the jumping problem with double-layer grid structure. Then, the improved K-means clustering method is used to simplify data distribution. The results show that grid-ml builds fast with small spatial storage volume, and can query fast as well, demonstrating significant advantages over the traditional spatial indexing approach.(传统空间索引的体量随数据量的增加而膨胀,查询效率较低。学习索引的体量不随数据量的增加而膨胀,同时避免了层级比较查询,性能优异。将学习索引应用于空间索引存在2个难点:一是选取合适的降维方法实现空间数据的排序;二是对降维后数据序列进行有效的简化分布计算,使其易于拟合。基于此,提出了一种网格混合聚类分区学习索引(grid-ml),用z曲线进行降维,用双层网格结构优化查询策略,用改进的K-means聚类算法进行数据分区,实现数据分布均匀化。对比实验发现,grid-ml构建速度快、存储空间小、查询效率高,较传统空间索引优势显著。)https://doi.org/10.3785/j.issn.1008-9497.2024.02.003learned index(学习索引)k-means clustering(k-means聚类)space filling curve(空间填充曲线)spatial index(空间索引)
spellingShingle 傅晨华(FU Chenhua)
张丰(ZHANG Feng)
胡林舒(HU Linshu)
王立君(WANG Lijun)
Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引)
Zhejiang Daxue xuebao. Lixue ban
learned index(学习索引)
k-means clustering(k-means聚类)
space filling curve(空间填充曲线)
spatial index(空间索引)
title Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引)
title_full Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引)
title_fullStr Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引)
title_full_unstemmed Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引)
title_short Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引)
title_sort develop spatial learning indexing using improved k means clustering partition 基于改进的k means聚类分区均匀化空间学习索引
topic learned index(学习索引)
k-means clustering(k-means聚类)
space filling curve(空间填充曲线)
spatial index(空间索引)
url https://doi.org/10.3785/j.issn.1008-9497.2024.02.003
work_keys_str_mv AT fùchénhuáfuchenhua developspatiallearningindexingusingimprovedkmeansclusteringpartitionjīyúgǎijìndekmeansjùlèifēnqūjūnyúnhuàkōngjiānxuéxísuǒyǐn
AT zhāngfēngzhangfeng developspatiallearningindexingusingimprovedkmeansclusteringpartitionjīyúgǎijìndekmeansjùlèifēnqūjūnyúnhuàkōngjiānxuéxísuǒyǐn
AT húlínshūhulinshu developspatiallearningindexingusingimprovedkmeansclusteringpartitionjīyúgǎijìndekmeansjùlèifēnqūjūnyúnhuàkōngjiānxuéxísuǒyǐn
AT wánglìjūnwanglijun developspatiallearningindexingusingimprovedkmeansclusteringpartitionjīyúgǎijìndekmeansjùlèifēnqūjūnyúnhuàkōngjiānxuéxísuǒyǐn