Temporal Relationship between Daily Reports of COVID-19 Infections and Related GDELT and Tweet Mentions

Social media platforms are valuable data sources in the study of public reactions to events such as natural disasters and epidemics. This research assesses for selected countries around the globe the time lag between daily reports of COVID-19 cases and GDELT (Global Database of Events, Language, and...

Full description

Bibliographic Details
Main Authors: Innocensia Owuor, Hartwig H. Hochmair
Format: Article
Language:English
Published: MDPI AG 2023-09-01
Series:Geographies
Subjects:
Online Access:https://www.mdpi.com/2673-7086/3/3/31
_version_ 1797579903159238656
author Innocensia Owuor
Hartwig H. Hochmair
author_facet Innocensia Owuor
Hartwig H. Hochmair
author_sort Innocensia Owuor
collection DOAJ
description Social media platforms are valuable data sources in the study of public reactions to events such as natural disasters and epidemics. This research assesses for selected countries around the globe the time lag between daily reports of COVID-19 cases and GDELT (Global Database of Events, Language, and Tone) and Twitter (X) COVID-19 mentions between February 2020 and April 2021 using time series analysis. Results show that GDELT articles and tweets preceded COVID-19 infections in Australia, Brazil, France, Greece, India, Italy, the U.S., Canada, Germany, and the U.K., while for Poland and the Philippines, tweets preceded and GDELT articles lagged behind COVID-19 disease incidences, respectively. This shows that the application of social media and news data for surveillance and management of pandemics needs to be assessed on a case-by-case basis for different countries. It also points towards the applicability of time series data analysis for only a limited number of countries due to strict data requirements (e.g., stationarity). A deviation from generally observed lag patterns in a country, i.e., periods with low COVID-19 infections but unusually high numbers of COVID-19-related GDELT articles or tweets, signals an anomaly. We use the seasonal hybrid extreme Studentized deviate test to detect such anomalies. This is followed by text analysis of news headlines from NewsBank and Google on the date of these anomalies to determine the probable event causing an anomaly, which includes elections, holidays, and protests.
first_indexed 2024-03-10T22:43:39Z
format Article
id doaj.art-a9d1241254fc4e5cbcd2b1912cebc0ee
institution Directory Open Access Journal
issn 2673-7086
language English
last_indexed 2024-03-10T22:43:39Z
publishDate 2023-09-01
publisher MDPI AG
record_format Article
series Geographies
spelling doaj.art-a9d1241254fc4e5cbcd2b1912cebc0ee2023-11-19T10:54:26ZengMDPI AGGeographies2673-70862023-09-013358460910.3390/geographies3030031Temporal Relationship between Daily Reports of COVID-19 Infections and Related GDELT and Tweet MentionsInnocensia Owuor0Hartwig H. Hochmair1Geomatics Sciences, Fort Lauderdale Research and Education Center, University of Florida, Davie, FL 33314, USAGeomatics Sciences, Fort Lauderdale Research and Education Center, University of Florida, Davie, FL 33314, USASocial media platforms are valuable data sources in the study of public reactions to events such as natural disasters and epidemics. This research assesses for selected countries around the globe the time lag between daily reports of COVID-19 cases and GDELT (Global Database of Events, Language, and Tone) and Twitter (X) COVID-19 mentions between February 2020 and April 2021 using time series analysis. Results show that GDELT articles and tweets preceded COVID-19 infections in Australia, Brazil, France, Greece, India, Italy, the U.S., Canada, Germany, and the U.K., while for Poland and the Philippines, tweets preceded and GDELT articles lagged behind COVID-19 disease incidences, respectively. This shows that the application of social media and news data for surveillance and management of pandemics needs to be assessed on a case-by-case basis for different countries. It also points towards the applicability of time series data analysis for only a limited number of countries due to strict data requirements (e.g., stationarity). A deviation from generally observed lag patterns in a country, i.e., periods with low COVID-19 infections but unusually high numbers of COVID-19-related GDELT articles or tweets, signals an anomaly. We use the seasonal hybrid extreme Studentized deviate test to detect such anomalies. This is followed by text analysis of news headlines from NewsBank and Google on the date of these anomalies to determine the probable event causing an anomaly, which includes elections, holidays, and protests.https://www.mdpi.com/2673-7086/3/3/31time series analysisTwitter (X)cross-correlationanomalypandemic
spellingShingle Innocensia Owuor
Hartwig H. Hochmair
Temporal Relationship between Daily Reports of COVID-19 Infections and Related GDELT and Tweet Mentions
Geographies
time series analysis
Twitter (X)
cross-correlation
anomaly
pandemic
title Temporal Relationship between Daily Reports of COVID-19 Infections and Related GDELT and Tweet Mentions
title_full Temporal Relationship between Daily Reports of COVID-19 Infections and Related GDELT and Tweet Mentions
title_fullStr Temporal Relationship between Daily Reports of COVID-19 Infections and Related GDELT and Tweet Mentions
title_full_unstemmed Temporal Relationship between Daily Reports of COVID-19 Infections and Related GDELT and Tweet Mentions
title_short Temporal Relationship between Daily Reports of COVID-19 Infections and Related GDELT and Tweet Mentions
title_sort temporal relationship between daily reports of covid 19 infections and related gdelt and tweet mentions
topic time series analysis
Twitter (X)
cross-correlation
anomaly
pandemic
url https://www.mdpi.com/2673-7086/3/3/31
work_keys_str_mv AT innocensiaowuor temporalrelationshipbetweendailyreportsofcovid19infectionsandrelatedgdeltandtweetmentions
AT hartwighhochmair temporalrelationshipbetweendailyreportsofcovid19infectionsandrelatedgdeltandtweetmentions