Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts

We present a novel approach incorporating transformer-based language models into infectious disease modelling. Text-derived features are quantified by tracking high-density clusters of sentence-level representations of Reddit posts within specific US states’ COVID-19 subreddits. We benchmark these c...

সম্পূর্ণ বিবরণ

গ্রন্থ-পঞ্জীর বিবরন
প্রধান লেখক: Drinkall, F, Zohren, S, Pierrehumbert, JB
বিন্যাস: Conference item
ভাষা:English
প্রকাশিত: Association for Computational Linguistics 2022
_version_ 1826311954124242944
author Drinkall, F
Zohren, S
Pierrehumbert, JB
author_facet Drinkall, F
Zohren, S
Pierrehumbert, JB
author_sort Drinkall, F
collection OXFORD
description We present a novel approach incorporating transformer-based language models into infectious disease modelling. Text-derived features are quantified by tracking high-density clusters of sentence-level representations of Reddit posts within specific US states’ COVID-19 subreddits. We benchmark these clustered embedding features against features extracted from other high-quality datasets. In a threshold-classification task, we show that they outperform all other feature types at predicting upward trend signals, a significant result for infectious disease modelling in areas where epidemiological data is unreliable. Subsequently, in a time-series forecasting task, we fully utilise the predictive power of the caseload and compare the relative strengths of using different supplementary datasets as covariate feature sets in a transformer-based time-series model.
first_indexed 2024-03-07T08:18:54Z
format Conference item
id oxford-uuid:8f62455e-9cec-4039-bcb4-d554f8f354c4
institution University of Oxford
language English
last_indexed 2024-03-07T08:18:54Z
publishDate 2022
publisher Association for Computational Linguistics
record_format dspace
spelling oxford-uuid:8f62455e-9cec-4039-bcb4-d554f8f354c42024-01-16T12:26:05ZForecasting COVID-19 caseloads using unsupervised embedding clusters of social media postsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:8f62455e-9cec-4039-bcb4-d554f8f354c4EnglishSymplectic ElementsAssociation for Computational Linguistics2022Drinkall, FZohren, SPierrehumbert, JBWe present a novel approach incorporating transformer-based language models into infectious disease modelling. Text-derived features are quantified by tracking high-density clusters of sentence-level representations of Reddit posts within specific US states’ COVID-19 subreddits. We benchmark these clustered embedding features against features extracted from other high-quality datasets. In a threshold-classification task, we show that they outperform all other feature types at predicting upward trend signals, a significant result for infectious disease modelling in areas where epidemiological data is unreliable. Subsequently, in a time-series forecasting task, we fully utilise the predictive power of the caseload and compare the relative strengths of using different supplementary datasets as covariate feature sets in a transformer-based time-series model.
spellingShingle Drinkall, F
Zohren, S
Pierrehumbert, JB
Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts
title Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts
title_full Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts
title_fullStr Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts
title_full_unstemmed Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts
title_short Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts
title_sort forecasting covid 19 caseloads using unsupervised embedding clusters of social media posts
work_keys_str_mv AT drinkallf forecastingcovid19caseloadsusingunsupervisedembeddingclustersofsocialmediaposts
AT zohrens forecastingcovid19caseloadsusingunsupervisedembeddingclustersofsocialmediaposts
AT pierrehumbertjb forecastingcovid19caseloadsusingunsupervisedembeddingclustersofsocialmediaposts