Two-Stage Dimensionality Reduction for Social Media Engagement Classification
The high dimensionality of real-life datasets is one of the biggest challenges in the machine learning field. Due to the increased need for computational resources, the higher the dimension of the input data is, the more difficult the learning task will be—a phenomenon commonly referred to as the cu...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-02-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/14/3/1269 |
_version_ | 1827354833390141440 |
---|---|
author | Jose Luis Vieira Sobrinho Flavio Henrique Teles Vieira Alisson Assis Cardoso |
author_facet | Jose Luis Vieira Sobrinho Flavio Henrique Teles Vieira Alisson Assis Cardoso |
author_sort | Jose Luis Vieira Sobrinho |
collection | DOAJ |
description | The high dimensionality of real-life datasets is one of the biggest challenges in the machine learning field. Due to the increased need for computational resources, the higher the dimension of the input data is, the more difficult the learning task will be—a phenomenon commonly referred to as the curse of dimensionality. Laying the paper’s foundation based on this premise, we propose a two-stage dimensionality reduction (TSDR) method for data classification. The first stage extracts high-quality features to a new subset by maximizing the pairwise separation probability, with the aim of avoiding overlap between individuals from different classes that are close to one another, also known as the class masking problem. The second stage takes the previous resulting subset and transforms it into a reduced final space in a way that maximizes the distance between the cluster centers of different classes while also minimizing the dispersion of instances within the same class. Hence, the second stage aims to improve the accuracy of the succeeding classifier by lowering its sensitivity to an imbalanced distribution of instances between different classes. Experiments on benchmark and social media datasets show how promising the proposed method is over some well-established algorithms, especially regarding social media engagement classification. |
first_indexed | 2024-03-08T04:00:20Z |
format | Article |
id | doaj.art-3048d4a199d146d5becf3b540853e2d9 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-08T04:00:20Z |
publishDate | 2024-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-3048d4a199d146d5becf3b540853e2d92024-02-09T15:08:27ZengMDPI AGApplied Sciences2076-34172024-02-01143126910.3390/app14031269Two-Stage Dimensionality Reduction for Social Media Engagement ClassificationJose Luis Vieira Sobrinho0Flavio Henrique Teles Vieira1Alisson Assis Cardoso2School of Electrical, Mechanical, and Computer Engineering, Federal University of Goias, Goiânia 74605-020, BrazilSchool of Electrical, Mechanical, and Computer Engineering, Federal University of Goias, Goiânia 74605-020, BrazilSchool of Electrical, Mechanical, and Computer Engineering, Federal University of Goias, Goiânia 74605-020, BrazilThe high dimensionality of real-life datasets is one of the biggest challenges in the machine learning field. Due to the increased need for computational resources, the higher the dimension of the input data is, the more difficult the learning task will be—a phenomenon commonly referred to as the curse of dimensionality. Laying the paper’s foundation based on this premise, we propose a two-stage dimensionality reduction (TSDR) method for data classification. The first stage extracts high-quality features to a new subset by maximizing the pairwise separation probability, with the aim of avoiding overlap between individuals from different classes that are close to one another, also known as the class masking problem. The second stage takes the previous resulting subset and transforms it into a reduced final space in a way that maximizes the distance between the cluster centers of different classes while also minimizing the dispersion of instances within the same class. Hence, the second stage aims to improve the accuracy of the succeeding classifier by lowering its sensitivity to an imbalanced distribution of instances between different classes. Experiments on benchmark and social media datasets show how promising the proposed method is over some well-established algorithms, especially regarding social media engagement classification.https://www.mdpi.com/2076-3417/14/3/1269dimensionality reductionclassificationoptimization |
spellingShingle | Jose Luis Vieira Sobrinho Flavio Henrique Teles Vieira Alisson Assis Cardoso Two-Stage Dimensionality Reduction for Social Media Engagement Classification Applied Sciences dimensionality reduction classification optimization |
title | Two-Stage Dimensionality Reduction for Social Media Engagement Classification |
title_full | Two-Stage Dimensionality Reduction for Social Media Engagement Classification |
title_fullStr | Two-Stage Dimensionality Reduction for Social Media Engagement Classification |
title_full_unstemmed | Two-Stage Dimensionality Reduction for Social Media Engagement Classification |
title_short | Two-Stage Dimensionality Reduction for Social Media Engagement Classification |
title_sort | two stage dimensionality reduction for social media engagement classification |
topic | dimensionality reduction classification optimization |
url | https://www.mdpi.com/2076-3417/14/3/1269 |
work_keys_str_mv | AT joseluisvieirasobrinho twostagedimensionalityreductionforsocialmediaengagementclassification AT flaviohenriquetelesvieira twostagedimensionalityreductionforsocialmediaengagementclassification AT alissonassiscardoso twostagedimensionalityreductionforsocialmediaengagementclassification |