A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration

As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is all...

Full description

Bibliographic Details
Main Authors: Juan M. Banda, Ramya Tekumalla, Guanyu Wang, Jingyuan Yu, Tuo Liu, Yuning Ding, Ekaterina Artemova, Elena Tutubalina, Gerardo Chowell
Format: Article
Language:English
Published: MDPI AG 2021-08-01
Series:Epidemiologia
Subjects:
Online Access:https://www.mdpi.com/2673-3986/2/3/24
_version_ 1797519340191350784
author Juan M. Banda
Ramya Tekumalla
Guanyu Wang
Jingyuan Yu
Tuo Liu
Yuning Ding
Ekaterina Artemova
Elena Tutubalina
Gerardo Chowell
author_facet Juan M. Banda
Ramya Tekumalla
Guanyu Wang
Jingyuan Yu
Tuo Liu
Yuning Ding
Ekaterina Artemova
Elena Tutubalina
Gerardo Chowell
author_sort Juan M. Banda
collection DOAJ
description As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological analyses, emotional and mental responses to social distancing measures, the identification of sources of misinformation, stratified measurement of sentiment towards the pandemic in near real time, among many others.
first_indexed 2024-03-10T07:41:28Z
format Article
id doaj.art-498e8095f708404c975307c76dd95998
institution Directory Open Access Journal
issn 2673-3986
language English
last_indexed 2024-03-10T07:41:28Z
publishDate 2021-08-01
publisher MDPI AG
record_format Article
series Epidemiologia
spelling doaj.art-498e8095f708404c975307c76dd959982023-11-22T12:59:00ZengMDPI AGEpidemiologia2673-39862021-08-012331532410.3390/epidemiologia2030024A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International CollaborationJuan M. Banda0Ramya Tekumalla1Guanyu Wang2Jingyuan Yu3Tuo Liu4Yuning Ding5Ekaterina Artemova6Elena Tutubalina7Gerardo Chowell8Department of Computer Science, Georgia State University, Atlanta, GA 30303, USADepartment of Computer Science, Georgia State University, Atlanta, GA 30303, USAMissouri School of Journalism, University of Missouri, Columbia, MO 65201, USADepartment of Social Psychology, Universitat Autònoma de Barcelona, 08035 Barcelona, SpainDepartment of Psychology, Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, GermanyLanguage Technology Lab, Universität Duisburg-Essen, 47057 Duisburg, GermanyFaculty of Computer Science, Higher School of Economics—National Research University, 101000 Moscow, RussiaFaculty of Chemistry, Kazan Federal University, 420008 Kazan, RussiaDepartment of Population Health Sciences, Georgia State University, Atlanta, GA 30303, USAAs the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological analyses, emotional and mental responses to social distancing measures, the identification of sources of misinformation, stratified measurement of sentiment towards the pandemic in near real time, among many others.https://www.mdpi.com/2673-3986/2/3/24public datasetsopen scienceCOVID-19social mediadata sources
spellingShingle Juan M. Banda
Ramya Tekumalla
Guanyu Wang
Jingyuan Yu
Tuo Liu
Yuning Ding
Ekaterina Artemova
Elena Tutubalina
Gerardo Chowell
A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration
Epidemiologia
public datasets
open science
COVID-19
social media
data sources
title A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration
title_full A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration
title_fullStr A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration
title_full_unstemmed A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration
title_short A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration
title_sort large scale covid 19 twitter chatter dataset for open scientific research an international collaboration
topic public datasets
open science
COVID-19
social media
data sources
url https://www.mdpi.com/2673-3986/2/3/24
work_keys_str_mv AT juanmbanda alargescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT ramyatekumalla alargescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT guanyuwang alargescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT jingyuanyu alargescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT tuoliu alargescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT yuningding alargescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT ekaterinaartemova alargescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT elenatutubalina alargescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT gerardochowell alargescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT juanmbanda largescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT ramyatekumalla largescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT guanyuwang largescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT jingyuanyu largescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT tuoliu largescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT yuningding largescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT ekaterinaartemova largescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT elenatutubalina largescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration
AT gerardochowell largescalecovid19twitterchatterdatasetforopenscientificresearchaninternationalcollaboration