Two-Stage Dimensionality Reduction for Social Media Engagement Classification

The high dimensionality of real-life datasets is one of the biggest challenges in the machine learning field. Due to the increased need for computational resources, the higher the dimension of the input data is, the more difficult the learning task will be—a phenomenon commonly referred to as the cu...

Full description

Bibliographic Details
Main Authors: Jose Luis Vieira Sobrinho, Flavio Henrique Teles Vieira, Alisson Assis Cardoso
Format: Article
Language:English
Published: MDPI AG 2024-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/3/1269
_version_ 1827354833390141440
author Jose Luis Vieira Sobrinho
Flavio Henrique Teles Vieira
Alisson Assis Cardoso
author_facet Jose Luis Vieira Sobrinho
Flavio Henrique Teles Vieira
Alisson Assis Cardoso
author_sort Jose Luis Vieira Sobrinho
collection DOAJ
description The high dimensionality of real-life datasets is one of the biggest challenges in the machine learning field. Due to the increased need for computational resources, the higher the dimension of the input data is, the more difficult the learning task will be—a phenomenon commonly referred to as the curse of dimensionality. Laying the paper’s foundation based on this premise, we propose a two-stage dimensionality reduction (TSDR) method for data classification. The first stage extracts high-quality features to a new subset by maximizing the pairwise separation probability, with the aim of avoiding overlap between individuals from different classes that are close to one another, also known as the class masking problem. The second stage takes the previous resulting subset and transforms it into a reduced final space in a way that maximizes the distance between the cluster centers of different classes while also minimizing the dispersion of instances within the same class. Hence, the second stage aims to improve the accuracy of the succeeding classifier by lowering its sensitivity to an imbalanced distribution of instances between different classes. Experiments on benchmark and social media datasets show how promising the proposed method is over some well-established algorithms, especially regarding social media engagement classification.
first_indexed 2024-03-08T04:00:20Z
format Article
id doaj.art-3048d4a199d146d5becf3b540853e2d9
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-08T04:00:20Z
publishDate 2024-02-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-3048d4a199d146d5becf3b540853e2d92024-02-09T15:08:27ZengMDPI AGApplied Sciences2076-34172024-02-01143126910.3390/app14031269Two-Stage Dimensionality Reduction for Social Media Engagement ClassificationJose Luis Vieira Sobrinho0Flavio Henrique Teles Vieira1Alisson Assis Cardoso2School of Electrical, Mechanical, and Computer Engineering, Federal University of Goias, Goiânia 74605-020, BrazilSchool of Electrical, Mechanical, and Computer Engineering, Federal University of Goias, Goiânia 74605-020, BrazilSchool of Electrical, Mechanical, and Computer Engineering, Federal University of Goias, Goiânia 74605-020, BrazilThe high dimensionality of real-life datasets is one of the biggest challenges in the machine learning field. Due to the increased need for computational resources, the higher the dimension of the input data is, the more difficult the learning task will be—a phenomenon commonly referred to as the curse of dimensionality. Laying the paper’s foundation based on this premise, we propose a two-stage dimensionality reduction (TSDR) method for data classification. The first stage extracts high-quality features to a new subset by maximizing the pairwise separation probability, with the aim of avoiding overlap between individuals from different classes that are close to one another, also known as the class masking problem. The second stage takes the previous resulting subset and transforms it into a reduced final space in a way that maximizes the distance between the cluster centers of different classes while also minimizing the dispersion of instances within the same class. Hence, the second stage aims to improve the accuracy of the succeeding classifier by lowering its sensitivity to an imbalanced distribution of instances between different classes. Experiments on benchmark and social media datasets show how promising the proposed method is over some well-established algorithms, especially regarding social media engagement classification.https://www.mdpi.com/2076-3417/14/3/1269dimensionality reductionclassificationoptimization
spellingShingle Jose Luis Vieira Sobrinho
Flavio Henrique Teles Vieira
Alisson Assis Cardoso
Two-Stage Dimensionality Reduction for Social Media Engagement Classification
Applied Sciences
dimensionality reduction
classification
optimization
title Two-Stage Dimensionality Reduction for Social Media Engagement Classification
title_full Two-Stage Dimensionality Reduction for Social Media Engagement Classification
title_fullStr Two-Stage Dimensionality Reduction for Social Media Engagement Classification
title_full_unstemmed Two-Stage Dimensionality Reduction for Social Media Engagement Classification
title_short Two-Stage Dimensionality Reduction for Social Media Engagement Classification
title_sort two stage dimensionality reduction for social media engagement classification
topic dimensionality reduction
classification
optimization
url https://www.mdpi.com/2076-3417/14/3/1269
work_keys_str_mv AT joseluisvieirasobrinho twostagedimensionalityreductionforsocialmediaengagementclassification
AT flaviohenriquetelesvieira twostagedimensionalityreductionforsocialmediaengagementclassification
AT alissonassiscardoso twostagedimensionalityreductionforsocialmediaengagementclassification