Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis
The work presented in this paper makes multiple scientific contributions with a specific focus on the analysis of misinformation about COVID-19 on YouTube. First, the results of topic modeling performed on the video descriptions of YouTube videos containing misinformation about COVID-19 revealed fou...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-02-01
|
Series: | Computation |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-3197/12/2/28 |
_version_ | 1797298584551424000 |
---|---|
author | Nirmalya Thakur Shuqi Cui Victoria Knieling Karam Khanna Mingchen Shao |
author_facet | Nirmalya Thakur Shuqi Cui Victoria Knieling Karam Khanna Mingchen Shao |
author_sort | Nirmalya Thakur |
collection | DOAJ |
description | The work presented in this paper makes multiple scientific contributions with a specific focus on the analysis of misinformation about COVID-19 on YouTube. First, the results of topic modeling performed on the video descriptions of YouTube videos containing misinformation about COVID-19 revealed four distinct themes or focus areas—<i>Promotion and Outreach Efforts</i>, <i>Treatment for COVID-19</i>, <i>Conspiracy Theories Regarding COVID-19</i>, and <i>COVID-19 and Politics</i>. Second, the results of topic-specific sentiment analysis revealed the sentiment associated with each of these themes. For the videos belonging to the theme of <i>Promotion and Outreach Efforts</i>, 45.8% were neutral, 39.8% were positive, and 14.4% were negative. For the videos belonging to the theme of <i>Treatment for COVID-19</i>, 38.113% were positive, 31.343% were neutral, and 30.544% were negative. For the videos belonging to the theme of <i>Conspiracy Theories Regarding COVID-19</i>, 46.9% were positive, 31.0% were neutral, and 22.1% were negative. For the videos belonging to the theme of <i>COVID-19 and Politics</i>, 35.70% were positive, 32.86% were negative, and 31.44% were neutral. Third, topic-specific language analysis was performed to detect the various languages in which the video descriptions for each topic were published on YouTube. This analysis revealed multiple novel insights. For instance, for all the themes, English and Spanish were the most widely used and second most widely used languages, respectively. Fourth, the patterns of sharing these videos on other social media channels, such as Facebook and Twitter, were also investigated. The results revealed that videos containing video descriptions in English were shared the highest number of times on Facebook and Twitter. Finally, correlation analysis was performed by taking into account multiple characteristics of these videos. The results revealed that the correlation between the length of the video title and the number of tweets and the correlation between the length of the video title and the number of Facebook posts were statistically significant. |
first_indexed | 2024-03-07T22:36:58Z |
format | Article |
id | doaj.art-d6c5defb8fd44ddda5da7fa8b17e8400 |
institution | Directory Open Access Journal |
issn | 2079-3197 |
language | English |
last_indexed | 2024-03-07T22:36:58Z |
publishDate | 2024-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Computation |
spelling | doaj.art-d6c5defb8fd44ddda5da7fa8b17e84002024-02-23T15:12:51ZengMDPI AGComputation2079-31972024-02-011222810.3390/computation12020028Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language AnalysisNirmalya Thakur0Shuqi Cui1Victoria Knieling2Karam Khanna3Mingchen Shao4Department of Computer Science, Emory University, Atlanta, GA 30322, USADepartment of Computer Science, Emory University, Atlanta, GA 30322, USAProgram in Linguistics, Emory University, Atlanta, GA 30322, USADepartment of Computer Science, Emory University, Atlanta, GA 30322, USADepartment of Computer Science, Emory University, Atlanta, GA 30322, USAThe work presented in this paper makes multiple scientific contributions with a specific focus on the analysis of misinformation about COVID-19 on YouTube. First, the results of topic modeling performed on the video descriptions of YouTube videos containing misinformation about COVID-19 revealed four distinct themes or focus areas—<i>Promotion and Outreach Efforts</i>, <i>Treatment for COVID-19</i>, <i>Conspiracy Theories Regarding COVID-19</i>, and <i>COVID-19 and Politics</i>. Second, the results of topic-specific sentiment analysis revealed the sentiment associated with each of these themes. For the videos belonging to the theme of <i>Promotion and Outreach Efforts</i>, 45.8% were neutral, 39.8% were positive, and 14.4% were negative. For the videos belonging to the theme of <i>Treatment for COVID-19</i>, 38.113% were positive, 31.343% were neutral, and 30.544% were negative. For the videos belonging to the theme of <i>Conspiracy Theories Regarding COVID-19</i>, 46.9% were positive, 31.0% were neutral, and 22.1% were negative. For the videos belonging to the theme of <i>COVID-19 and Politics</i>, 35.70% were positive, 32.86% were negative, and 31.44% were neutral. Third, topic-specific language analysis was performed to detect the various languages in which the video descriptions for each topic were published on YouTube. This analysis revealed multiple novel insights. For instance, for all the themes, English and Spanish were the most widely used and second most widely used languages, respectively. Fourth, the patterns of sharing these videos on other social media channels, such as Facebook and Twitter, were also investigated. The results revealed that videos containing video descriptions in English were shared the highest number of times on Facebook and Twitter. Finally, correlation analysis was performed by taking into account multiple characteristics of these videos. The results revealed that the correlation between the length of the video title and the number of tweets and the correlation between the length of the video title and the number of Facebook posts were statistically significant.https://www.mdpi.com/2079-3197/12/2/28COVID-19YouTubemisinformationbig datadata analysistopic modeling |
spellingShingle | Nirmalya Thakur Shuqi Cui Victoria Knieling Karam Khanna Mingchen Shao Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis Computation COVID-19 YouTube misinformation big data data analysis topic modeling |
title | Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis |
title_full | Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis |
title_fullStr | Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis |
title_full_unstemmed | Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis |
title_short | Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis |
title_sort | investigation of the misinformation about covid 19 on youtube using topic modeling sentiment analysis and language analysis |
topic | COVID-19 YouTube misinformation big data data analysis topic modeling |
url | https://www.mdpi.com/2079-3197/12/2/28 |
work_keys_str_mv | AT nirmalyathakur investigationofthemisinformationaboutcovid19onyoutubeusingtopicmodelingsentimentanalysisandlanguageanalysis AT shuqicui investigationofthemisinformationaboutcovid19onyoutubeusingtopicmodelingsentimentanalysisandlanguageanalysis AT victoriaknieling investigationofthemisinformationaboutcovid19onyoutubeusingtopicmodelingsentimentanalysisandlanguageanalysis AT karamkhanna investigationofthemisinformationaboutcovid19onyoutubeusingtopicmodelingsentimentanalysisandlanguageanalysis AT mingchenshao investigationofthemisinformationaboutcovid19onyoutubeusingtopicmodelingsentimentanalysisandlanguageanalysis |