Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts
We present a novel approach incorporating transformer-based language models into infectious disease modelling. Text-derived features are quantified by tracking high-density clusters of sentence-level representations of Reddit posts within specific US states’ COVID-19 subreddits. We benchmark these c...
প্রধান লেখক: | , , |
---|---|
বিন্যাস: | Conference item |
ভাষা: | English |
প্রকাশিত: |
Association for Computational Linguistics
2022
|
_version_ | 1826311954124242944 |
---|---|
author | Drinkall, F Zohren, S Pierrehumbert, JB |
author_facet | Drinkall, F Zohren, S Pierrehumbert, JB |
author_sort | Drinkall, F |
collection | OXFORD |
description | We present a novel approach incorporating transformer-based language models into infectious disease modelling. Text-derived features are quantified by tracking high-density clusters of sentence-level representations of Reddit posts within specific US states’ COVID-19 subreddits. We benchmark these clustered embedding features against features extracted from other high-quality datasets. In a threshold-classification task, we show that they outperform all other feature types at predicting upward trend signals, a significant result for infectious disease modelling in areas where epidemiological data is unreliable. Subsequently, in a time-series forecasting task, we fully utilise the predictive power of the caseload and compare the relative strengths of using different supplementary datasets as covariate feature sets in a transformer-based time-series model. |
first_indexed | 2024-03-07T08:18:54Z |
format | Conference item |
id | oxford-uuid:8f62455e-9cec-4039-bcb4-d554f8f354c4 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T08:18:54Z |
publishDate | 2022 |
publisher | Association for Computational Linguistics |
record_format | dspace |
spelling | oxford-uuid:8f62455e-9cec-4039-bcb4-d554f8f354c42024-01-16T12:26:05ZForecasting COVID-19 caseloads using unsupervised embedding clusters of social media postsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:8f62455e-9cec-4039-bcb4-d554f8f354c4EnglishSymplectic ElementsAssociation for Computational Linguistics2022Drinkall, FZohren, SPierrehumbert, JBWe present a novel approach incorporating transformer-based language models into infectious disease modelling. Text-derived features are quantified by tracking high-density clusters of sentence-level representations of Reddit posts within specific US states’ COVID-19 subreddits. We benchmark these clustered embedding features against features extracted from other high-quality datasets. In a threshold-classification task, we show that they outperform all other feature types at predicting upward trend signals, a significant result for infectious disease modelling in areas where epidemiological data is unreliable. Subsequently, in a time-series forecasting task, we fully utilise the predictive power of the caseload and compare the relative strengths of using different supplementary datasets as covariate feature sets in a transformer-based time-series model. |
spellingShingle | Drinkall, F Zohren, S Pierrehumbert, JB Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts |
title | Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts |
title_full | Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts |
title_fullStr | Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts |
title_full_unstemmed | Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts |
title_short | Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts |
title_sort | forecasting covid 19 caseloads using unsupervised embedding clusters of social media posts |
work_keys_str_mv | AT drinkallf forecastingcovid19caseloadsusingunsupervisedembeddingclustersofsocialmediaposts AT zohrens forecastingcovid19caseloadsusingunsupervisedembeddingclustersofsocialmediaposts AT pierrehumbertjb forecastingcovid19caseloadsusingunsupervisedembeddingclustersofsocialmediaposts |