Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications
This research aimed to compare the effectiveness of two Natural Language Processing (NLP) techniques—SpaCy's word embeddings and Sklearn's TF-IDF vectorization—in identifying hate speech within online comments. Utilizing a balanced dataset, each model was meticulously assessed on its abil...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Universitas Islam Raden Rahmat
2024-01-01
|
Series: | G-Tech |
Subjects: | |
Online Access: | https://ejournal.uniramalang.ac.id/index.php/g-tech/article/view/3959 |
_version_ | 1797302550511222784 |
---|---|
author | Gregorius Airlangga |
author_facet | Gregorius Airlangga |
author_sort | Gregorius Airlangga |
collection | DOAJ |
description |
This research aimed to compare the effectiveness of two Natural Language Processing (NLP) techniques—SpaCy's word embeddings and Sklearn's TF-IDF vectorization—in identifying hate speech within online comments. Utilizing a balanced dataset, each model was meticulously assessed on its ability to classify comments as 'hateful' or 'non-hateful'. The evaluation metrics employed were precision, recall, F1-score, and overall accuracy. The model using SpaCy's word embeddings achieved an accuracy of 65%, with equal precision and recall for both classes. The Sklearn's TF-IDF vectorization model, however, demonstrated superior performance with an overall accuracy of 75% and an enhanced ability to correctly identify hateful comments, evidenced by a 77% recall rate. This suggests that the TF-IDF model is more adept at discerning nuanced expressions of hate speech. The study's findings highlight the critical role of vectorization methods in the field of automated content moderation and stress the importance of continued innovation and model adaptation to effectively manage the evolving nature of online hate speech.
|
first_indexed | 2024-03-07T23:39:29Z |
format | Article |
id | doaj.art-8e31c687b54f4b01a5ba2ccc0fe5f983 |
institution | Directory Open Access Journal |
issn | 2580-8737 2623-064X |
language | English |
last_indexed | 2024-03-07T23:39:29Z |
publishDate | 2024-01-01 |
publisher | Universitas Islam Raden Rahmat |
record_format | Article |
series | G-Tech |
spelling | doaj.art-8e31c687b54f4b01a5ba2ccc0fe5f9832024-02-20T02:50:35ZengUniversitas Islam Raden RahmatG-Tech2580-87372623-064X2024-01-0181Comparative Analysis of NLP Techniques for Hate Speech Classification in Online CommunicationsGregorius Airlangga0Atma Jaya Catholic University of Indonesia, Indonesia This research aimed to compare the effectiveness of two Natural Language Processing (NLP) techniques—SpaCy's word embeddings and Sklearn's TF-IDF vectorization—in identifying hate speech within online comments. Utilizing a balanced dataset, each model was meticulously assessed on its ability to classify comments as 'hateful' or 'non-hateful'. The evaluation metrics employed were precision, recall, F1-score, and overall accuracy. The model using SpaCy's word embeddings achieved an accuracy of 65%, with equal precision and recall for both classes. The Sklearn's TF-IDF vectorization model, however, demonstrated superior performance with an overall accuracy of 75% and an enhanced ability to correctly identify hateful comments, evidenced by a 77% recall rate. This suggests that the TF-IDF model is more adept at discerning nuanced expressions of hate speech. The study's findings highlight the critical role of vectorization methods in the field of automated content moderation and stress the importance of continued innovation and model adaptation to effectively manage the evolving nature of online hate speech. https://ejournal.uniramalang.ac.id/index.php/g-tech/article/view/3959NLPHatefull Speech DetectionWord EmbeddingTFIDFMachine Learning |
spellingShingle | Gregorius Airlangga Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications G-Tech NLP Hatefull Speech Detection Word Embedding TFIDF Machine Learning |
title | Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications |
title_full | Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications |
title_fullStr | Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications |
title_full_unstemmed | Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications |
title_short | Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications |
title_sort | comparative analysis of nlp techniques for hate speech classification in online communications |
topic | NLP Hatefull Speech Detection Word Embedding TFIDF Machine Learning |
url | https://ejournal.uniramalang.ac.id/index.php/g-tech/article/view/3959 |
work_keys_str_mv | AT gregoriusairlangga comparativeanalysisofnlptechniquesforhatespeechclassificationinonlinecommunications |