Annotated dataset of history-related tweets

In this article, we present a dataset containing history-related content obtained from social media. It contains hashtags and tweets that include these hashtags, as well as the results of third party tools applied to the tweets that include extracted entities, years, and url categories, and the cate...

Full description

Bibliographic Details
Main Authors: Yasunobu Sumikawa, Adam Jatowt
Format: Article
Language:English
Published: Elsevier 2021-10-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340921006284
_version_ 1818983268688592896
author Yasunobu Sumikawa
Adam Jatowt
author_facet Yasunobu Sumikawa
Adam Jatowt
author_sort Yasunobu Sumikawa
collection DOAJ
description In this article, we present a dataset containing history-related content obtained from social media. It contains hashtags and tweets that include these hashtags, as well as the results of third party tools applied to the tweets that include extracted entities, years, and url categories, and the categories for the history-related hashtags we used to crawl the tweets. We collected the tweets from Twitter official API using hashtag-based crawling. The crawling process had been performed from March 2016 to July 2018. During the crawling, we applied a bootstrapping approach which is an iterative process of collecting tweets using a small set of seed hashtags, and a manual inspection of newly acquired hashtags that co-occur with the seed hashtags to include those they are related to history. Finally, we collected 147 history-related hashtags and 2,370,252 tweets. We then defined 6 categories for the collected hashtags after their manual investigation. The presented dataset could be useful for further analysis on how people refer to history in Twitter, for collecting new history-related tweets, for training classifiers to detect history-related tweets, or for further investigations of the proposed hashtag categories.
first_indexed 2024-12-20T18:00:25Z
format Article
id doaj.art-882224071cad49be925be7aa09f3d7a3
institution Directory Open Access Journal
issn 2352-3409
language English
last_indexed 2024-12-20T18:00:25Z
publishDate 2021-10-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj.art-882224071cad49be925be7aa09f3d7a32022-12-21T19:30:39ZengElsevierData in Brief2352-34092021-10-0138107344Annotated dataset of history-related tweetsYasunobu Sumikawa0Adam Jatowt1Takushoku University, Japan; Corresponding author.University of Innsbruck, AustriaIn this article, we present a dataset containing history-related content obtained from social media. It contains hashtags and tweets that include these hashtags, as well as the results of third party tools applied to the tweets that include extracted entities, years, and url categories, and the categories for the history-related hashtags we used to crawl the tweets. We collected the tweets from Twitter official API using hashtag-based crawling. The crawling process had been performed from March 2016 to July 2018. During the crawling, we applied a bootstrapping approach which is an iterative process of collecting tweets using a small set of seed hashtags, and a manual inspection of newly acquired hashtags that co-occur with the seed hashtags to include those they are related to history. Finally, we collected 147 history-related hashtags and 2,370,252 tweets. We then defined 6 categories for the collected hashtags after their manual investigation. The presented dataset could be useful for further analysis on how people refer to history in Twitter, for collecting new history-related tweets, for training classifiers to detect history-related tweets, or for further investigations of the proposed hashtag categories.http://www.sciencedirect.com/science/article/pii/S2352340921006284Digital historyTweetsHashtagsHashtag categoriesTemporal analysis
spellingShingle Yasunobu Sumikawa
Adam Jatowt
Annotated dataset of history-related tweets
Data in Brief
Digital history
Tweets
Hashtags
Hashtag categories
Temporal analysis
title Annotated dataset of history-related tweets
title_full Annotated dataset of history-related tweets
title_fullStr Annotated dataset of history-related tweets
title_full_unstemmed Annotated dataset of history-related tweets
title_short Annotated dataset of history-related tweets
title_sort annotated dataset of history related tweets
topic Digital history
Tweets
Hashtags
Hashtag categories
Temporal analysis
url http://www.sciencedirect.com/science/article/pii/S2352340921006284
work_keys_str_mv AT yasunobusumikawa annotateddatasetofhistoryrelatedtweets
AT adamjatowt annotateddatasetofhistoryrelatedtweets