Data Analysis of the Web News Headlines based on Natural Language Processing

This paper explores the problem of media content data analysis with the focus on the phenomenon of vaccination, closely related to the COVID-19 pandemic. The presented research is an extension of the previous work, but it differs in two main areas. Firstly, the text corpus submitted to the analysis...

Full description

Bibliographic Details
Main Authors: Hrvoje Karna, Maja Braovic, Linda Vickovic, Damir Krstinic
Format: Article
Language:English
Published: Croatian Communications and Information Society (CCIS) 2023-06-01
Series:Journal of Communications Software and Systems
Subjects:
Online Access:https://jcoms.fesb.unist.hr/10.24138/jcomss-2023-0047/
_version_ 1827909645249806336
author Hrvoje Karna
Maja Braovic
Linda Vickovic
Damir Krstinic
author_facet Hrvoje Karna
Maja Braovic
Linda Vickovic
Damir Krstinic
author_sort Hrvoje Karna
collection DOAJ
description This paper explores the problem of media content data analysis with the focus on the phenomenon of vaccination, closely related to the COVID-19 pandemic. The presented research is an extension of the previous work, but it differs in two main areas. Firstly, the text corpus submitted to the analysis has been considerably increased. Secondly, the previous data analysis was performed on the body part of the posts, while now it is focused on the most prominent part of the news posts, their headlines. This change from body to headline analysis was provoked by significant differences in their characteristics and the fact that most people read only headlines. Described data acquisition uses an advanced content collection approach followed by the modeling process, during which a set of natural language processing algorithms were applied. To enable the comparison, the model uses the same set of algorithms in the modeling phase like in previous work. The main contributions of the work are manifested in: i) approaching the problem from a new perspective, ii) applying more efficient method of data collection, and crucially iii) enabling the comparison of analysis results for individual parts of the content, which ensured a comprehensive insight into the characteristics of news posts.
first_indexed 2024-03-13T01:41:37Z
format Article
id doaj.art-44aac49c0c934868abee4dac1bb40cde
institution Directory Open Access Journal
issn 1845-6421
1846-6079
language English
last_indexed 2024-03-13T01:41:37Z
publishDate 2023-06-01
publisher Croatian Communications and Information Society (CCIS)
record_format Article
series Journal of Communications Software and Systems
spelling doaj.art-44aac49c0c934868abee4dac1bb40cde2023-07-03T13:36:05ZengCroatian Communications and Information Society (CCIS)Journal of Communications Software and Systems1845-64211846-60792023-06-0119215816710.24138/jcomss-2023-0047Data Analysis of the Web News Headlines based on Natural Language ProcessingHrvoje KarnaMaja BraovicLinda VickovicDamir KrstinicThis paper explores the problem of media content data analysis with the focus on the phenomenon of vaccination, closely related to the COVID-19 pandemic. The presented research is an extension of the previous work, but it differs in two main areas. Firstly, the text corpus submitted to the analysis has been considerably increased. Secondly, the previous data analysis was performed on the body part of the posts, while now it is focused on the most prominent part of the news posts, their headlines. This change from body to headline analysis was provoked by significant differences in their characteristics and the fact that most people read only headlines. Described data acquisition uses an advanced content collection approach followed by the modeling process, during which a set of natural language processing algorithms were applied. To enable the comparison, the model uses the same set of algorithms in the modeling phase like in previous work. The main contributions of the work are manifested in: i) approaching the problem from a new perspective, ii) applying more efficient method of data collection, and crucially iii) enabling the comparison of analysis results for individual parts of the content, which ensured a comprehensive insight into the characteristics of news posts.https://jcoms.fesb.unist.hr/10.24138/jcomss-2023-0047/data mininginformation extractionnatural language processingnews portalstext analysis
spellingShingle Hrvoje Karna
Maja Braovic
Linda Vickovic
Damir Krstinic
Data Analysis of the Web News Headlines based on Natural Language Processing
Journal of Communications Software and Systems
data mining
information extraction
natural language processing
news portals
text analysis
title Data Analysis of the Web News Headlines based on Natural Language Processing
title_full Data Analysis of the Web News Headlines based on Natural Language Processing
title_fullStr Data Analysis of the Web News Headlines based on Natural Language Processing
title_full_unstemmed Data Analysis of the Web News Headlines based on Natural Language Processing
title_short Data Analysis of the Web News Headlines based on Natural Language Processing
title_sort data analysis of the web news headlines based on natural language processing
topic data mining
information extraction
natural language processing
news portals
text analysis
url https://jcoms.fesb.unist.hr/10.24138/jcomss-2023-0047/
work_keys_str_mv AT hrvojekarna dataanalysisofthewebnewsheadlinesbasedonnaturallanguageprocessing
AT majabraovic dataanalysisofthewebnewsheadlinesbasedonnaturallanguageprocessing
AT lindavickovic dataanalysisofthewebnewsheadlinesbasedonnaturallanguageprocessing
AT damirkrstinic dataanalysisofthewebnewsheadlinesbasedonnaturallanguageprocessing