Optimizing sentiment analysis of Indonesian texts: Enhancing deep learning models with genetic algorithm-based feature selection

Automatic text classification techniques are employed in a multitude of real-world applications, including the filtering of unsolicited messages, the analysis of sentiment, and the categorization of news items. The primary challenge in text representation is the high dimensionality, which can increa...

Full description

Bibliographic Details
Main Authors: Siti, Mujilahwati, Noor Zuraidin, Mohd Safar, Ku Muhammad Naim, Ku Khalif, Nasyitah, Ghazalli
Format: Article
Language:English
Published: Penerbit UTHM 2024
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/43886/1/Optimizing%20sentiment%20analysis%20of%20indonesian%20texts.pdf
Description
Summary:Automatic text classification techniques are employed in a multitude of real-world applications, including the filtering of unsolicited messages, the analysis of sentiment, and the categorization of news items. The primary challenge in text representation is the high dimensionality, which can increase the complexity and risk of overfitting the model. To address this challenge, feature selection (FS) is conducted during the data pre-processing phase with the objective of enhancing the learning accuracy and efficiency of the model. This study examines the optimization of Indonesian text sentiment analysis through the integration of feature selection using a genetic algorithm (GA) with deep learning models. The application of GA for data dimensionality reduction from 41,140 to 20,769 features, coupled with fitness evaluation based on SVM, resulted in an observed increase in accuracy by 8.10% for SVM, 36.1% for Naïve Bayes, 7.82% for LSTM, 5.47% for DNN, and 6.25% for CNN. Of the three deep learning models, LSTM demonstrated the highest accuracy, at 91.41%, while also exhibiting a notable reduction in computation time, approaching 50%.