A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data
Extracting the latent knowledge from Twitter by applying spatial clustering on geotagged tweets provides the ability to discover events and their locations. DBSCAN (density-based spatial clustering of applications with noise), which has been widely used to retrieve events from geotagged tweets, cann...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-02-01
|
Series: | ISPRS International Journal of Geo-Information |
Subjects: | |
Online Access: | https://www.mdpi.com/2220-9964/8/2/82 |
_version_ | 1818764912997957632 |
---|---|
author | Zeinab Ghaemi Mahdi Farnaghi |
author_facet | Zeinab Ghaemi Mahdi Farnaghi |
author_sort | Zeinab Ghaemi |
collection | DOAJ |
description | Extracting the latent knowledge from Twitter by applying spatial clustering on geotagged tweets provides the ability to discover events and their locations. DBSCAN (density-based spatial clustering of applications with noise), which has been widely used to retrieve events from geotagged tweets, cannot efficiently detect clusters when there is significant spatial heterogeneity in the dataset, as it is the case for Twitter data where the distribution of users, as well as the intensity of publishing tweets, varies over the study areas. This study proposes VDCT (Varied Density-based spatial Clustering for Twitter data) algorithm that extracts clusters from geotagged tweets by considering spatial heterogeneity. The algorithm employs exponential spline interpolation to determine different search radiuses for cluster detection. Moreover, in addition to spatial proximity, textual similarities among tweets are also taken into account by the algorithm. In order to examine the efficiency of the algorithm, geotagged tweets collected during a hurricane in the United States were used for event detection. The output clusters of VDCT have been compared to those of DBSCAN. Visual and quantitative comparison of the results proved the feasibility of the proposed method. |
first_indexed | 2024-12-18T08:09:45Z |
format | Article |
id | doaj.art-03882912f4894b2a86c3ccd657226324 |
institution | Directory Open Access Journal |
issn | 2220-9964 |
language | English |
last_indexed | 2024-12-18T08:09:45Z |
publishDate | 2019-02-01 |
publisher | MDPI AG |
record_format | Article |
series | ISPRS International Journal of Geo-Information |
spelling | doaj.art-03882912f4894b2a86c3ccd6572263242022-12-21T21:14:54ZengMDPI AGISPRS International Journal of Geo-Information2220-99642019-02-01828210.3390/ijgi8020082ijgi8020082A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter DataZeinab Ghaemi0Mahdi Farnaghi1Faculty of Geodesy and Geomatics Engineering, K. N. Toosi University of Technology, Tehran 1996715433, IranFaculty of Geodesy and Geomatics Engineering, K. N. Toosi University of Technology, Tehran 1996715433, IranExtracting the latent knowledge from Twitter by applying spatial clustering on geotagged tweets provides the ability to discover events and their locations. DBSCAN (density-based spatial clustering of applications with noise), which has been widely used to retrieve events from geotagged tweets, cannot efficiently detect clusters when there is significant spatial heterogeneity in the dataset, as it is the case for Twitter data where the distribution of users, as well as the intensity of publishing tweets, varies over the study areas. This study proposes VDCT (Varied Density-based spatial Clustering for Twitter data) algorithm that extracts clusters from geotagged tweets by considering spatial heterogeneity. The algorithm employs exponential spline interpolation to determine different search radiuses for cluster detection. Moreover, in addition to spatial proximity, textual similarities among tweets are also taken into account by the algorithm. In order to examine the efficiency of the algorithm, geotagged tweets collected during a hurricane in the United States were used for event detection. The output clusters of VDCT have been compared to those of DBSCAN. Visual and quantitative comparison of the results proved the feasibility of the proposed method.https://www.mdpi.com/2220-9964/8/2/82spatial clusteringdensity-based clusteringspatial heterogeneitytext Similaritytwitter |
spellingShingle | Zeinab Ghaemi Mahdi Farnaghi A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data ISPRS International Journal of Geo-Information spatial clustering density-based clustering spatial heterogeneity text Similarity |
title | A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data |
title_full | A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data |
title_fullStr | A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data |
title_full_unstemmed | A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data |
title_short | A Varied Density-based Clustering Approach for Event Detection from Heterogeneous Twitter Data |
title_sort | varied density based clustering approach for event detection from heterogeneous twitter data |
topic | spatial clustering density-based clustering spatial heterogeneity text Similarity |
url | https://www.mdpi.com/2220-9964/8/2/82 |
work_keys_str_mv | AT zeinabghaemi avarieddensitybasedclusteringapproachforeventdetectionfromheterogeneoustwitterdata AT mahdifarnaghi avarieddensitybasedclusteringapproachforeventdetectionfromheterogeneoustwitterdata AT zeinabghaemi varieddensitybasedclusteringapproachforeventdetectionfromheterogeneoustwitterdata AT mahdifarnaghi varieddensitybasedclusteringapproachforeventdetectionfromheterogeneoustwitterdata |