BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration
The COVID-19 pandemic has introduced new norms, such as social distancing, face masks, quarantine, lockdowns, travel restrictions, work/study from home, and business closures, to name a few. The pandemic’s seriousness has made people vocal on social media, especially on microblogs such as Twitter. S...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-06-01
|
Series: | Data in Brief |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340923003487 |
_version_ | 1797797980680486912 |
---|---|
author | Rabindra Lamsal Maria Rodriguez Read Shanika Karunasekera |
author_facet | Rabindra Lamsal Maria Rodriguez Read Shanika Karunasekera |
author_sort | Rabindra Lamsal |
collection | DOAJ |
description | The COVID-19 pandemic has introduced new norms, such as social distancing, face masks, quarantine, lockdowns, travel restrictions, work/study from home, and business closures, to name a few. The pandemic’s seriousness has made people vocal on social media, especially on microblogs such as Twitter. Since the early days of the outbreak, researchers have been collecting and sharing large-scale datasets of COVID-19 tweets. However, the existing datasets carry issues related to proportion and redundancy. We report that more than 500 million tweet identifiers point to deleted or protected tweets. To address these issues, this paper introduces an enriched global billion-scale English-language COVID-19 tweets dataset, BillionCOV,1 which contains 1.4 billion tweets originating from 240 countries and territories between October 2019 and April 2022. Importantly, BillionCOV facilitates researchers to filter tweet identifiers for efficient hydration. We anticipate that the dataset of this scale with global scope and extended temporal coverage will aid in obtaining a thorough understanding of the pandemic’s conversational dynamics. |
first_indexed | 2024-03-13T03:57:33Z |
format | Article |
id | doaj.art-0b647d2d6c5d4126a6cee289196b154d |
institution | Directory Open Access Journal |
issn | 2352-3409 |
language | English |
last_indexed | 2024-03-13T03:57:33Z |
publishDate | 2023-06-01 |
publisher | Elsevier |
record_format | Article |
series | Data in Brief |
spelling | doaj.art-0b647d2d6c5d4126a6cee289196b154d2023-06-22T05:04:04ZengElsevierData in Brief2352-34092023-06-0148109229BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydrationRabindra Lamsal0Maria Rodriguez Read1Shanika Karunasekera2Corresponding author.; School of Computing and Information Systems, The University of Melbourne Melbourne, Victoria 3010, AustraliaSchool of Computing and Information Systems, The University of Melbourne Melbourne, Victoria 3010, AustraliaSchool of Computing and Information Systems, The University of Melbourne Melbourne, Victoria 3010, AustraliaThe COVID-19 pandemic has introduced new norms, such as social distancing, face masks, quarantine, lockdowns, travel restrictions, work/study from home, and business closures, to name a few. The pandemic’s seriousness has made people vocal on social media, especially on microblogs such as Twitter. Since the early days of the outbreak, researchers have been collecting and sharing large-scale datasets of COVID-19 tweets. However, the existing datasets carry issues related to proportion and redundancy. We report that more than 500 million tweet identifiers point to deleted or protected tweets. To address these issues, this paper introduces an enriched global billion-scale English-language COVID-19 tweets dataset, BillionCOV,1 which contains 1.4 billion tweets originating from 240 countries and territories between October 2019 and April 2022. Importantly, BillionCOV facilitates researchers to filter tweet identifiers for efficient hydration. We anticipate that the dataset of this scale with global scope and extended temporal coverage will aid in obtaining a thorough understanding of the pandemic’s conversational dynamics.http://www.sciencedirect.com/science/article/pii/S2352340923003487Pandemic discourseCOVID-19 conversationsCrisis informaticsTwitter conversations |
spellingShingle | Rabindra Lamsal Maria Rodriguez Read Shanika Karunasekera BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration Data in Brief Pandemic discourse COVID-19 conversations Crisis informatics Twitter conversations |
title | BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration |
title_full | BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration |
title_fullStr | BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration |
title_full_unstemmed | BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration |
title_short | BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration |
title_sort | billioncov an enriched billion scale collection of covid 19 tweets for efficient hydration |
topic | Pandemic discourse COVID-19 conversations Crisis informatics Twitter conversations |
url | http://www.sciencedirect.com/science/article/pii/S2352340923003487 |
work_keys_str_mv | AT rabindralamsal billioncovanenrichedbillionscalecollectionofcovid19tweetsforefficienthydration AT mariarodriguezread billioncovanenrichedbillionscalecollectionofcovid19tweetsforefficienthydration AT shanikakarunasekera billioncovanenrichedbillionscalecollectionofcovid19tweetsforefficienthydration |