Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引)
With the rapid increase of data size, the defects of traditional spatial indexing become more and more apparent. In comparison, learning indexing is based on data distribution. Its volume will not expand with the increase of the amount of data, and can achieve better performance without performing h...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Zhejiang University Press
2024-03-01
|
Series: | Zhejiang Daxue xuebao. Lixue ban |
Subjects: | |
Online Access: | https://doi.org/10.3785/j.issn.1008-9497.2024.02.003 |
_version_ | 1797230889833332736 |
---|---|
author | 傅晨华(FU Chenhua) 张丰(ZHANG Feng) 胡林舒(HU Linshu) 王立君(WANG Lijun) |
author_facet | 傅晨华(FU Chenhua) 张丰(ZHANG Feng) 胡林舒(HU Linshu) 王立君(WANG Lijun) |
author_sort | 傅晨华(FU Chenhua) |
collection | DOAJ |
description | With the rapid increase of data size, the defects of traditional spatial indexing become more and more apparent. In comparison, learning indexing is based on data distribution. Its volume will not expand with the increase of the amount of data, and can achieve better performance without performing hierarchical comparison. Nevertheless, there are still two difficulties in applying the idea of learning indexing to spatial data: (1) How to choose appropriate dimension reduction method to sort the spatial data. (2) How to simplify data distribution of the dimension reduced data and make it easy to fit. This paper proposes a new type of grid mixed cluster partition learning indexing (grid-ml) based on the idea of learning indexing. In view of the above two difficulties, grid-ml uses z curve to reduce the dimension, and deals with the jumping problem with double-layer grid structure. Then, the improved K-means clustering method is used to simplify data distribution. The results show that grid-ml builds fast with small spatial storage volume, and can query fast as well, demonstrating significant advantages over the traditional spatial indexing approach.(传统空间索引的体量随数据量的增加而膨胀,查询效率较低。学习索引的体量不随数据量的增加而膨胀,同时避免了层级比较查询,性能优异。将学习索引应用于空间索引存在2个难点:一是选取合适的降维方法实现空间数据的排序;二是对降维后数据序列进行有效的简化分布计算,使其易于拟合。基于此,提出了一种网格混合聚类分区学习索引(grid-ml),用z曲线进行降维,用双层网格结构优化查询策略,用改进的K-means聚类算法进行数据分区,实现数据分布均匀化。对比实验发现,grid-ml构建速度快、存储空间小、查询效率高,较传统空间索引优势显著。) |
first_indexed | 2024-04-24T15:35:40Z |
format | Article |
id | doaj.art-cb31e4220ac0486b89186b769d672dd7 |
institution | Directory Open Access Journal |
issn | 1008-9497 |
language | zho |
last_indexed | 2024-04-24T15:35:40Z |
publishDate | 2024-03-01 |
publisher | Zhejiang University Press |
record_format | Article |
series | Zhejiang Daxue xuebao. Lixue ban |
spelling | doaj.art-cb31e4220ac0486b89186b769d672dd72024-04-02T02:09:53ZzhoZhejiang University PressZhejiang Daxue xuebao. Lixue ban1008-94972024-03-0151215316110.3785/j.issn.1008-9497.2024.02.003Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引)傅晨华(FU Chenhua)0https://orcid.org/0009-0002-0683-624X张丰(ZHANG Feng)1https://orcid.org/0000-0003-1475-8480胡林舒(HU Linshu)2王立君(WANG Lijun)31Earth Science College, Zhejiang University, Hangzhou 310058, China(1浙江大学 地球科学学院,浙江 杭州 310058)1Earth Science College, Zhejiang University, Hangzhou 310058, China(1浙江大学 地球科学学院,浙江 杭州 310058)1Earth Science College, Zhejiang University, Hangzhou 310058, China(1浙江大学 地球科学学院,浙江 杭州 310058)1Earth Science College, Zhejiang University, Hangzhou 310058, China(1浙江大学 地球科学学院,浙江 杭州 310058)With the rapid increase of data size, the defects of traditional spatial indexing become more and more apparent. In comparison, learning indexing is based on data distribution. Its volume will not expand with the increase of the amount of data, and can achieve better performance without performing hierarchical comparison. Nevertheless, there are still two difficulties in applying the idea of learning indexing to spatial data: (1) How to choose appropriate dimension reduction method to sort the spatial data. (2) How to simplify data distribution of the dimension reduced data and make it easy to fit. This paper proposes a new type of grid mixed cluster partition learning indexing (grid-ml) based on the idea of learning indexing. In view of the above two difficulties, grid-ml uses z curve to reduce the dimension, and deals with the jumping problem with double-layer grid structure. Then, the improved K-means clustering method is used to simplify data distribution. The results show that grid-ml builds fast with small spatial storage volume, and can query fast as well, demonstrating significant advantages over the traditional spatial indexing approach.(传统空间索引的体量随数据量的增加而膨胀,查询效率较低。学习索引的体量不随数据量的增加而膨胀,同时避免了层级比较查询,性能优异。将学习索引应用于空间索引存在2个难点:一是选取合适的降维方法实现空间数据的排序;二是对降维后数据序列进行有效的简化分布计算,使其易于拟合。基于此,提出了一种网格混合聚类分区学习索引(grid-ml),用z曲线进行降维,用双层网格结构优化查询策略,用改进的K-means聚类算法进行数据分区,实现数据分布均匀化。对比实验发现,grid-ml构建速度快、存储空间小、查询效率高,较传统空间索引优势显著。)https://doi.org/10.3785/j.issn.1008-9497.2024.02.003learned index(学习索引)k-means clustering(k-means聚类)space filling curve(空间填充曲线)spatial index(空间索引) |
spellingShingle | 傅晨华(FU Chenhua) 张丰(ZHANG Feng) 胡林舒(HU Linshu) 王立君(WANG Lijun) Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引) Zhejiang Daxue xuebao. Lixue ban learned index(学习索引) k-means clustering(k-means聚类) space filling curve(空间填充曲线) spatial index(空间索引) |
title | Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引) |
title_full | Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引) |
title_fullStr | Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引) |
title_full_unstemmed | Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引) |
title_short | Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引) |
title_sort | develop spatial learning indexing using improved k means clustering partition 基于改进的k means聚类分区均匀化空间学习索引 |
topic | learned index(学习索引) k-means clustering(k-means聚类) space filling curve(空间填充曲线) spatial index(空间索引) |
url | https://doi.org/10.3785/j.issn.1008-9497.2024.02.003 |
work_keys_str_mv | AT fùchénhuáfuchenhua developspatiallearningindexingusingimprovedkmeansclusteringpartitionjīyúgǎijìndekmeansjùlèifēnqūjūnyúnhuàkōngjiānxuéxísuǒyǐn AT zhāngfēngzhangfeng developspatiallearningindexingusingimprovedkmeansclusteringpartitionjīyúgǎijìndekmeansjùlèifēnqūjūnyúnhuàkōngjiānxuéxísuǒyǐn AT húlínshūhulinshu developspatiallearningindexingusingimprovedkmeansclusteringpartitionjīyúgǎijìndekmeansjùlèifēnqūjūnyúnhuàkōngjiānxuéxísuǒyǐn AT wánglìjūnwanglijun developspatiallearningindexingusingimprovedkmeansclusteringpartitionjīyúgǎijìndekmeansjùlèifēnqūjūnyúnhuàkōngjiānxuéxísuǒyǐn |