Evaluating the Representativeness of Socio-Demographic Variables over Time for Geo-Social Media Data

Geo-social media data are widely used as a data source to model populations and processes in a variety of contexts. However, if the data do not adequately represent the population they are drawn from, analysis results will be biased. Unaddressed, these biases may lead to false interpretations and co...

Full description

Bibliographic Details
Main Authors: Andreas Petutschnig, Bernd Resch, Stefan Lang, Clemens Havas
Format: Article
Language:English
Published: MDPI AG 2021-05-01
Series:ISPRS International Journal of Geo-Information
Subjects:
Online Access:https://www.mdpi.com/2220-9964/10/5/323
_version_ 1797534725285347328
author Andreas Petutschnig
Bernd Resch
Stefan Lang
Clemens Havas
author_facet Andreas Petutschnig
Bernd Resch
Stefan Lang
Clemens Havas
author_sort Andreas Petutschnig
collection DOAJ
description Geo-social media data are widely used as a data source to model populations and processes in a variety of contexts. However, if the data do not adequately represent the population they are drawn from, analysis results will be biased. Unaddressed, these biases may lead to false interpretations and conclusions. In this paper, we propose a generic methodology for investigating the representativeness of geo-social media data for population groups of similar statistical predictive power based on reference data. The groups are designed to be spatially coherent regions with similar prediction errors. Based on these units, we investigate the influence of different socio-demographic covariates on the representativeness. We perform experiments based on over 1.6 billion tweets and 90 socio-demographic covariates. We demonstrate that Twitter data representativeness varies strongly over time and space. Our results show that densely populated areas tend to be underrepresented consistently in non-spatial models. Over time, some covariates like the number of people aged 20 years exhibit highly different effects on the prediction models, whereas others are much more stable. The spatial effects can most frequently be explained using spatial error models, indicating spatially related errors that indicate the necessity of additional covariates. Finally, we provide hints for interpreting the results of our approach for researchers using the concepts presented in this paper.
first_indexed 2024-03-10T11:34:29Z
format Article
id doaj.art-42f94465a2614fe085b1e9e0d8ee7165
institution Directory Open Access Journal
issn 2220-9964
language English
last_indexed 2024-03-10T11:34:29Z
publishDate 2021-05-01
publisher MDPI AG
record_format Article
series ISPRS International Journal of Geo-Information
spelling doaj.art-42f94465a2614fe085b1e9e0d8ee71652023-11-21T19:00:17ZengMDPI AGISPRS International Journal of Geo-Information2220-99642021-05-0110532310.3390/ijgi10050323Evaluating the Representativeness of Socio-Demographic Variables over Time for Geo-Social Media DataAndreas Petutschnig0Bernd Resch1Stefan Lang2Clemens Havas3Department of Geoinformatics—Z_GIS, University of Salzburg, 5020 Salzburg, AustriaDepartment of Geoinformatics—Z_GIS, University of Salzburg, 5020 Salzburg, AustriaDepartment of Geoinformatics—Z_GIS, University of Salzburg, 5020 Salzburg, AustriaDepartment of Geoinformatics—Z_GIS, University of Salzburg, 5020 Salzburg, AustriaGeo-social media data are widely used as a data source to model populations and processes in a variety of contexts. However, if the data do not adequately represent the population they are drawn from, analysis results will be biased. Unaddressed, these biases may lead to false interpretations and conclusions. In this paper, we propose a generic methodology for investigating the representativeness of geo-social media data for population groups of similar statistical predictive power based on reference data. The groups are designed to be spatially coherent regions with similar prediction errors. Based on these units, we investigate the influence of different socio-demographic covariates on the representativeness. We perform experiments based on over 1.6 billion tweets and 90 socio-demographic covariates. We demonstrate that Twitter data representativeness varies strongly over time and space. Our results show that densely populated areas tend to be underrepresented consistently in non-spatial models. Over time, some covariates like the number of people aged 20 years exhibit highly different effects on the prediction models, whereas others are much more stable. The spatial effects can most frequently be explained using spatial error models, indicating spatially related errors that indicate the necessity of additional covariates. Finally, we provide hints for interpreting the results of our approach for researchers using the concepts presented in this paper.https://www.mdpi.com/2220-9964/10/5/323geo-social mediaTwitterrepresentativenessspatial analysisstatistical correlationstemporal snapshots
spellingShingle Andreas Petutschnig
Bernd Resch
Stefan Lang
Clemens Havas
Evaluating the Representativeness of Socio-Demographic Variables over Time for Geo-Social Media Data
ISPRS International Journal of Geo-Information
geo-social media
Twitter
representativeness
spatial analysis
statistical correlations
temporal snapshots
title Evaluating the Representativeness of Socio-Demographic Variables over Time for Geo-Social Media Data
title_full Evaluating the Representativeness of Socio-Demographic Variables over Time for Geo-Social Media Data
title_fullStr Evaluating the Representativeness of Socio-Demographic Variables over Time for Geo-Social Media Data
title_full_unstemmed Evaluating the Representativeness of Socio-Demographic Variables over Time for Geo-Social Media Data
title_short Evaluating the Representativeness of Socio-Demographic Variables over Time for Geo-Social Media Data
title_sort evaluating the representativeness of socio demographic variables over time for geo social media data
topic geo-social media
Twitter
representativeness
spatial analysis
statistical correlations
temporal snapshots
url https://www.mdpi.com/2220-9964/10/5/323
work_keys_str_mv AT andreaspetutschnig evaluatingtherepresentativenessofsociodemographicvariablesovertimeforgeosocialmediadata
AT berndresch evaluatingtherepresentativenessofsociodemographicvariablesovertimeforgeosocialmediadata
AT stefanlang evaluatingtherepresentativenessofsociodemographicvariablesovertimeforgeosocialmediadata
AT clemenshavas evaluatingtherepresentativenessofsociodemographicvariablesovertimeforgeosocialmediadata