Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis

The work presented in this paper makes multiple scientific contributions with a specific focus on the analysis of misinformation about COVID-19 on YouTube. First, the results of topic modeling performed on the video descriptions of YouTube videos containing misinformation about COVID-19 revealed fou...

Full description

Bibliographic Details
Main Authors: Nirmalya Thakur, Shuqi Cui, Victoria Knieling, Karam Khanna, Mingchen Shao
Format: Article
Language:English
Published: MDPI AG 2024-02-01
Series:Computation
Subjects:
Online Access:https://www.mdpi.com/2079-3197/12/2/28
_version_ 1797298584551424000
author Nirmalya Thakur
Shuqi Cui
Victoria Knieling
Karam Khanna
Mingchen Shao
author_facet Nirmalya Thakur
Shuqi Cui
Victoria Knieling
Karam Khanna
Mingchen Shao
author_sort Nirmalya Thakur
collection DOAJ
description The work presented in this paper makes multiple scientific contributions with a specific focus on the analysis of misinformation about COVID-19 on YouTube. First, the results of topic modeling performed on the video descriptions of YouTube videos containing misinformation about COVID-19 revealed four distinct themes or focus areas—<i>Promotion and Outreach Efforts</i>, <i>Treatment for COVID-19</i>, <i>Conspiracy Theories Regarding COVID-19</i>, and <i>COVID-19 and Politics</i>. Second, the results of topic-specific sentiment analysis revealed the sentiment associated with each of these themes. For the videos belonging to the theme of <i>Promotion and Outreach Efforts</i>, 45.8% were neutral, 39.8% were positive, and 14.4% were negative. For the videos belonging to the theme of <i>Treatment for COVID-19</i>, 38.113% were positive, 31.343% were neutral, and 30.544% were negative. For the videos belonging to the theme of <i>Conspiracy Theories Regarding COVID-19</i>, 46.9% were positive, 31.0% were neutral, and 22.1% were negative. For the videos belonging to the theme of <i>COVID-19 and Politics</i>, 35.70% were positive, 32.86% were negative, and 31.44% were neutral. Third, topic-specific language analysis was performed to detect the various languages in which the video descriptions for each topic were published on YouTube. This analysis revealed multiple novel insights. For instance, for all the themes, English and Spanish were the most widely used and second most widely used languages, respectively. Fourth, the patterns of sharing these videos on other social media channels, such as Facebook and Twitter, were also investigated. The results revealed that videos containing video descriptions in English were shared the highest number of times on Facebook and Twitter. Finally, correlation analysis was performed by taking into account multiple characteristics of these videos. The results revealed that the correlation between the length of the video title and the number of tweets and the correlation between the length of the video title and the number of Facebook posts were statistically significant.
first_indexed 2024-03-07T22:36:58Z
format Article
id doaj.art-d6c5defb8fd44ddda5da7fa8b17e8400
institution Directory Open Access Journal
issn 2079-3197
language English
last_indexed 2024-03-07T22:36:58Z
publishDate 2024-02-01
publisher MDPI AG
record_format Article
series Computation
spelling doaj.art-d6c5defb8fd44ddda5da7fa8b17e84002024-02-23T15:12:51ZengMDPI AGComputation2079-31972024-02-011222810.3390/computation12020028Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language AnalysisNirmalya Thakur0Shuqi Cui1Victoria Knieling2Karam Khanna3Mingchen Shao4Department of Computer Science, Emory University, Atlanta, GA 30322, USADepartment of Computer Science, Emory University, Atlanta, GA 30322, USAProgram in Linguistics, Emory University, Atlanta, GA 30322, USADepartment of Computer Science, Emory University, Atlanta, GA 30322, USADepartment of Computer Science, Emory University, Atlanta, GA 30322, USAThe work presented in this paper makes multiple scientific contributions with a specific focus on the analysis of misinformation about COVID-19 on YouTube. First, the results of topic modeling performed on the video descriptions of YouTube videos containing misinformation about COVID-19 revealed four distinct themes or focus areas—<i>Promotion and Outreach Efforts</i>, <i>Treatment for COVID-19</i>, <i>Conspiracy Theories Regarding COVID-19</i>, and <i>COVID-19 and Politics</i>. Second, the results of topic-specific sentiment analysis revealed the sentiment associated with each of these themes. For the videos belonging to the theme of <i>Promotion and Outreach Efforts</i>, 45.8% were neutral, 39.8% were positive, and 14.4% were negative. For the videos belonging to the theme of <i>Treatment for COVID-19</i>, 38.113% were positive, 31.343% were neutral, and 30.544% were negative. For the videos belonging to the theme of <i>Conspiracy Theories Regarding COVID-19</i>, 46.9% were positive, 31.0% were neutral, and 22.1% were negative. For the videos belonging to the theme of <i>COVID-19 and Politics</i>, 35.70% were positive, 32.86% were negative, and 31.44% were neutral. Third, topic-specific language analysis was performed to detect the various languages in which the video descriptions for each topic were published on YouTube. This analysis revealed multiple novel insights. For instance, for all the themes, English and Spanish were the most widely used and second most widely used languages, respectively. Fourth, the patterns of sharing these videos on other social media channels, such as Facebook and Twitter, were also investigated. The results revealed that videos containing video descriptions in English were shared the highest number of times on Facebook and Twitter. Finally, correlation analysis was performed by taking into account multiple characteristics of these videos. The results revealed that the correlation between the length of the video title and the number of tweets and the correlation between the length of the video title and the number of Facebook posts were statistically significant.https://www.mdpi.com/2079-3197/12/2/28COVID-19YouTubemisinformationbig datadata analysistopic modeling
spellingShingle Nirmalya Thakur
Shuqi Cui
Victoria Knieling
Karam Khanna
Mingchen Shao
Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis
Computation
COVID-19
YouTube
misinformation
big data
data analysis
topic modeling
title Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis
title_full Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis
title_fullStr Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis
title_full_unstemmed Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis
title_short Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis
title_sort investigation of the misinformation about covid 19 on youtube using topic modeling sentiment analysis and language analysis
topic COVID-19
YouTube
misinformation
big data
data analysis
topic modeling
url https://www.mdpi.com/2079-3197/12/2/28
work_keys_str_mv AT nirmalyathakur investigationofthemisinformationaboutcovid19onyoutubeusingtopicmodelingsentimentanalysisandlanguageanalysis
AT shuqicui investigationofthemisinformationaboutcovid19onyoutubeusingtopicmodelingsentimentanalysisandlanguageanalysis
AT victoriaknieling investigationofthemisinformationaboutcovid19onyoutubeusingtopicmodelingsentimentanalysisandlanguageanalysis
AT karamkhanna investigationofthemisinformationaboutcovid19onyoutubeusingtopicmodelingsentimentanalysisandlanguageanalysis
AT mingchenshao investigationofthemisinformationaboutcovid19onyoutubeusingtopicmodelingsentimentanalysisandlanguageanalysis