Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications

This research aimed to compare the effectiveness of two Natural Language Processing (NLP) techniques—SpaCy's word embeddings and Sklearn's TF-IDF vectorization—in identifying hate speech within online comments. Utilizing a balanced dataset, each model was meticulously assessed on its abil...

Full description

Bibliographic Details
Main Author: Gregorius Airlangga
Format: Article
Language:English
Published: Universitas Islam Raden Rahmat 2024-01-01
Series:G-Tech
Subjects:
Online Access:https://ejournal.uniramalang.ac.id/index.php/g-tech/article/view/3959
_version_ 1797302550511222784
author Gregorius Airlangga
author_facet Gregorius Airlangga
author_sort Gregorius Airlangga
collection DOAJ
description This research aimed to compare the effectiveness of two Natural Language Processing (NLP) techniques—SpaCy's word embeddings and Sklearn's TF-IDF vectorization—in identifying hate speech within online comments. Utilizing a balanced dataset, each model was meticulously assessed on its ability to classify comments as 'hateful' or 'non-hateful'. The evaluation metrics employed were precision, recall, F1-score, and overall accuracy. The model using SpaCy's word embeddings achieved an accuracy of 65%, with equal precision and recall for both classes. The Sklearn's TF-IDF vectorization model, however, demonstrated superior performance with an overall accuracy of 75% and an enhanced ability to correctly identify hateful comments, evidenced by a 77% recall rate. This suggests that the TF-IDF model is more adept at discerning nuanced expressions of hate speech. The study's findings highlight the critical role of vectorization methods in the field of automated content moderation and stress the importance of continued innovation and model adaptation to effectively manage the evolving nature of online hate speech.
first_indexed 2024-03-07T23:39:29Z
format Article
id doaj.art-8e31c687b54f4b01a5ba2ccc0fe5f983
institution Directory Open Access Journal
issn 2580-8737
2623-064X
language English
last_indexed 2024-03-07T23:39:29Z
publishDate 2024-01-01
publisher Universitas Islam Raden Rahmat
record_format Article
series G-Tech
spelling doaj.art-8e31c687b54f4b01a5ba2ccc0fe5f9832024-02-20T02:50:35ZengUniversitas Islam Raden RahmatG-Tech2580-87372623-064X2024-01-0181Comparative Analysis of NLP Techniques for Hate Speech Classification in Online CommunicationsGregorius Airlangga0Atma Jaya Catholic University of Indonesia, Indonesia This research aimed to compare the effectiveness of two Natural Language Processing (NLP) techniques—SpaCy's word embeddings and Sklearn's TF-IDF vectorization—in identifying hate speech within online comments. Utilizing a balanced dataset, each model was meticulously assessed on its ability to classify comments as 'hateful' or 'non-hateful'. The evaluation metrics employed were precision, recall, F1-score, and overall accuracy. The model using SpaCy's word embeddings achieved an accuracy of 65%, with equal precision and recall for both classes. The Sklearn's TF-IDF vectorization model, however, demonstrated superior performance with an overall accuracy of 75% and an enhanced ability to correctly identify hateful comments, evidenced by a 77% recall rate. This suggests that the TF-IDF model is more adept at discerning nuanced expressions of hate speech. The study's findings highlight the critical role of vectorization methods in the field of automated content moderation and stress the importance of continued innovation and model adaptation to effectively manage the evolving nature of online hate speech. https://ejournal.uniramalang.ac.id/index.php/g-tech/article/view/3959NLPHatefull Speech DetectionWord EmbeddingTFIDFMachine Learning
spellingShingle Gregorius Airlangga
Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications
G-Tech
NLP
Hatefull Speech Detection
Word Embedding
TFIDF
Machine Learning
title Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications
title_full Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications
title_fullStr Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications
title_full_unstemmed Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications
title_short Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications
title_sort comparative analysis of nlp techniques for hate speech classification in online communications
topic NLP
Hatefull Speech Detection
Word Embedding
TFIDF
Machine Learning
url https://ejournal.uniramalang.ac.id/index.php/g-tech/article/view/3959
work_keys_str_mv AT gregoriusairlangga comparativeanalysisofnlptechniquesforhatespeechclassificationinonlinecommunications