BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration

The COVID-19 pandemic has introduced new norms, such as social distancing, face masks, quarantine, lockdowns, travel restrictions, work/study from home, and business closures, to name a few. The pandemic’s seriousness has made people vocal on social media, especially on microblogs such as Twitter. S...

Full description

Bibliographic Details
Main Authors: Rabindra Lamsal, Maria Rodriguez Read, Shanika Karunasekera
Format: Article
Language:English
Published: Elsevier 2023-06-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340923003487
_version_ 1797797980680486912
author Rabindra Lamsal
Maria Rodriguez Read
Shanika Karunasekera
author_facet Rabindra Lamsal
Maria Rodriguez Read
Shanika Karunasekera
author_sort Rabindra Lamsal
collection DOAJ
description The COVID-19 pandemic has introduced new norms, such as social distancing, face masks, quarantine, lockdowns, travel restrictions, work/study from home, and business closures, to name a few. The pandemic’s seriousness has made people vocal on social media, especially on microblogs such as Twitter. Since the early days of the outbreak, researchers have been collecting and sharing large-scale datasets of COVID-19 tweets. However, the existing datasets carry issues related to proportion and redundancy. We report that more than 500 million tweet identifiers point to deleted or protected tweets. To address these issues, this paper introduces an enriched global billion-scale English-language COVID-19 tweets dataset, BillionCOV,1 which contains 1.4 billion tweets originating from 240 countries and territories between October 2019 and April 2022. Importantly, BillionCOV facilitates researchers to filter tweet identifiers for efficient hydration. We anticipate that the dataset of this scale with global scope and extended temporal coverage will aid in obtaining a thorough understanding of the pandemic’s conversational dynamics.
first_indexed 2024-03-13T03:57:33Z
format Article
id doaj.art-0b647d2d6c5d4126a6cee289196b154d
institution Directory Open Access Journal
issn 2352-3409
language English
last_indexed 2024-03-13T03:57:33Z
publishDate 2023-06-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj.art-0b647d2d6c5d4126a6cee289196b154d2023-06-22T05:04:04ZengElsevierData in Brief2352-34092023-06-0148109229BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydrationRabindra Lamsal0Maria Rodriguez Read1Shanika Karunasekera2Corresponding author.; School of Computing and Information Systems, The University of Melbourne Melbourne, Victoria 3010, AustraliaSchool of Computing and Information Systems, The University of Melbourne Melbourne, Victoria 3010, AustraliaSchool of Computing and Information Systems, The University of Melbourne Melbourne, Victoria 3010, AustraliaThe COVID-19 pandemic has introduced new norms, such as social distancing, face masks, quarantine, lockdowns, travel restrictions, work/study from home, and business closures, to name a few. The pandemic’s seriousness has made people vocal on social media, especially on microblogs such as Twitter. Since the early days of the outbreak, researchers have been collecting and sharing large-scale datasets of COVID-19 tweets. However, the existing datasets carry issues related to proportion and redundancy. We report that more than 500 million tweet identifiers point to deleted or protected tweets. To address these issues, this paper introduces an enriched global billion-scale English-language COVID-19 tweets dataset, BillionCOV,1 which contains 1.4 billion tweets originating from 240 countries and territories between October 2019 and April 2022. Importantly, BillionCOV facilitates researchers to filter tweet identifiers for efficient hydration. We anticipate that the dataset of this scale with global scope and extended temporal coverage will aid in obtaining a thorough understanding of the pandemic’s conversational dynamics.http://www.sciencedirect.com/science/article/pii/S2352340923003487Pandemic discourseCOVID-19 conversationsCrisis informaticsTwitter conversations
spellingShingle Rabindra Lamsal
Maria Rodriguez Read
Shanika Karunasekera
BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration
Data in Brief
Pandemic discourse
COVID-19 conversations
Crisis informatics
Twitter conversations
title BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration
title_full BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration
title_fullStr BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration
title_full_unstemmed BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration
title_short BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration
title_sort billioncov an enriched billion scale collection of covid 19 tweets for efficient hydration
topic Pandemic discourse
COVID-19 conversations
Crisis informatics
Twitter conversations
url http://www.sciencedirect.com/science/article/pii/S2352340923003487
work_keys_str_mv AT rabindralamsal billioncovanenrichedbillionscalecollectionofcovid19tweetsforefficienthydration
AT mariarodriguezread billioncovanenrichedbillionscalecollectionofcovid19tweetsforefficienthydration
AT shanikakarunasekera billioncovanenrichedbillionscalecollectionofcovid19tweetsforefficienthydration