Annotated dataset of history-related tweets
In this article, we present a dataset containing history-related content obtained from social media. It contains hashtags and tweets that include these hashtags, as well as the results of third party tools applied to the tweets that include extracted entities, years, and url categories, and the cate...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2021-10-01
|
Series: | Data in Brief |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340921006284 |
_version_ | 1818983268688592896 |
---|---|
author | Yasunobu Sumikawa Adam Jatowt |
author_facet | Yasunobu Sumikawa Adam Jatowt |
author_sort | Yasunobu Sumikawa |
collection | DOAJ |
description | In this article, we present a dataset containing history-related content obtained from social media. It contains hashtags and tweets that include these hashtags, as well as the results of third party tools applied to the tweets that include extracted entities, years, and url categories, and the categories for the history-related hashtags we used to crawl the tweets. We collected the tweets from Twitter official API using hashtag-based crawling. The crawling process had been performed from March 2016 to July 2018. During the crawling, we applied a bootstrapping approach which is an iterative process of collecting tweets using a small set of seed hashtags, and a manual inspection of newly acquired hashtags that co-occur with the seed hashtags to include those they are related to history. Finally, we collected 147 history-related hashtags and 2,370,252 tweets. We then defined 6 categories for the collected hashtags after their manual investigation. The presented dataset could be useful for further analysis on how people refer to history in Twitter, for collecting new history-related tweets, for training classifiers to detect history-related tweets, or for further investigations of the proposed hashtag categories. |
first_indexed | 2024-12-20T18:00:25Z |
format | Article |
id | doaj.art-882224071cad49be925be7aa09f3d7a3 |
institution | Directory Open Access Journal |
issn | 2352-3409 |
language | English |
last_indexed | 2024-12-20T18:00:25Z |
publishDate | 2021-10-01 |
publisher | Elsevier |
record_format | Article |
series | Data in Brief |
spelling | doaj.art-882224071cad49be925be7aa09f3d7a32022-12-21T19:30:39ZengElsevierData in Brief2352-34092021-10-0138107344Annotated dataset of history-related tweetsYasunobu Sumikawa0Adam Jatowt1Takushoku University, Japan; Corresponding author.University of Innsbruck, AustriaIn this article, we present a dataset containing history-related content obtained from social media. It contains hashtags and tweets that include these hashtags, as well as the results of third party tools applied to the tweets that include extracted entities, years, and url categories, and the categories for the history-related hashtags we used to crawl the tweets. We collected the tweets from Twitter official API using hashtag-based crawling. The crawling process had been performed from March 2016 to July 2018. During the crawling, we applied a bootstrapping approach which is an iterative process of collecting tweets using a small set of seed hashtags, and a manual inspection of newly acquired hashtags that co-occur with the seed hashtags to include those they are related to history. Finally, we collected 147 history-related hashtags and 2,370,252 tweets. We then defined 6 categories for the collected hashtags after their manual investigation. The presented dataset could be useful for further analysis on how people refer to history in Twitter, for collecting new history-related tweets, for training classifiers to detect history-related tweets, or for further investigations of the proposed hashtag categories.http://www.sciencedirect.com/science/article/pii/S2352340921006284Digital historyTweetsHashtagsHashtag categoriesTemporal analysis |
spellingShingle | Yasunobu Sumikawa Adam Jatowt Annotated dataset of history-related tweets Data in Brief Digital history Tweets Hashtags Hashtag categories Temporal analysis |
title | Annotated dataset of history-related tweets |
title_full | Annotated dataset of history-related tweets |
title_fullStr | Annotated dataset of history-related tweets |
title_full_unstemmed | Annotated dataset of history-related tweets |
title_short | Annotated dataset of history-related tweets |
title_sort | annotated dataset of history related tweets |
topic | Digital history Tweets Hashtags Hashtag categories Temporal analysis |
url | http://www.sciencedirect.com/science/article/pii/S2352340921006284 |
work_keys_str_mv | AT yasunobusumikawa annotateddatasetofhistoryrelatedtweets AT adamjatowt annotateddatasetofhistoryrelatedtweets |