A single-pass grid-based algorithm for clustering big data on spatial databases

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017

ग्रंथसूची विवरण
मुख्य लेखक: Taratoris, Evangelos.
अन्य लेखक: Samuel R. Madden.
स्वरूप: थीसिस
भाषा:eng
प्रकाशित: Massachusetts Institute of Technology 2018
विषय:
ऑनलाइन पहुंच:http://hdl.handle.net/1721.1/113168
_version_ 1826190174636212224
author Taratoris, Evangelos.
author2 Samuel R. Madden.
author_facet Samuel R. Madden.
Taratoris, Evangelos.
author_sort Taratoris, Evangelos.
collection MIT
description Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017
first_indexed 2024-09-23T08:36:13Z
format Thesis
id mit-1721.1/113168
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T08:36:13Z
publishDate 2018
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1131682019-07-01T03:18:21Z A single-pass grid-based algorithm for clustering big data on spatial databases Taratoris, Evangelos. Samuel R. Madden. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Electrical Engineering and Computer Science. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017 Cataloged from PDF version of thesis. Includes bibliographical references (pages 79-80). The problem of clustering multi-dimensional data has been well researched in the scientific community. It is a problem with wide scope and applications. With the rapid growth of very large databases, traditional clustering algorithms become inefficient due to insufficient memory capacity. Grid-based algorithms try to solve this problem by dividing the space into cells and then performing clustering on the cells. However these algorithms also become inefficient when even the grid becomes too large to be saved in memory. This thesis presents a new algorithm, SingleClus, that is performing clustering on a 2-dimensional dataset with a single pass of the dataset. Moreover, it optimizes the amount of disk I/0 operations while making modest use of main memory. Therefore it is theoretically optimal in terms of performance. It modifies and improves on the Hoshen-Kopelman clustering algorithm while dealing with the algorithm's fundamental challenges when operating in a Big Data setting. by Evangelos Taratoris. M. Eng. M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science 2018-01-12T21:15:11Z 2018-01-12T21:15:11Z 2017 2017 Thesis http://hdl.handle.net/1721.1/113168 1017485602 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 80 pages application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Taratoris, Evangelos.
A single-pass grid-based algorithm for clustering big data on spatial databases
title A single-pass grid-based algorithm for clustering big data on spatial databases
title_full A single-pass grid-based algorithm for clustering big data on spatial databases
title_fullStr A single-pass grid-based algorithm for clustering big data on spatial databases
title_full_unstemmed A single-pass grid-based algorithm for clustering big data on spatial databases
title_short A single-pass grid-based algorithm for clustering big data on spatial databases
title_sort single pass grid based algorithm for clustering big data on spatial databases
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/113168
work_keys_str_mv AT taratorisevangelos asinglepassgridbasedalgorithmforclusteringbigdataonspatialdatabases
AT taratorisevangelos singlepassgridbasedalgorithmforclusteringbigdataonspatialdatabases