Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications

This research aimed to compare the effectiveness of two Natural Language Processing (NLP) techniques—SpaCy's word embeddings and Sklearn's TF-IDF vectorization—in identifying hate speech within online comments. Utilizing a balanced dataset, each model was meticulously assessed on its abil...

Full description

Bibliographic Details
Main Author:	Gregorius Airlangga
Format:	Article
Language:	English
Published:	Universitas Islam Raden Rahmat 2024-01-01
Series:	G-Tech
Subjects:	NLP Hatefull Speech Detection Word Embedding TFIDF Machine Learning
Online Access:	https://ejournal.uniramalang.ac.id/index.php/g-tech/article/view/3959

_version_	1797302550511222784
author	Gregorius Airlangga
author_facet	Gregorius Airlangga
author_sort	Gregorius Airlangga
collection	DOAJ
description	This research aimed to compare the effectiveness of two Natural Language Processing (NLP) techniques—SpaCy's word embeddings and Sklearn's TF-IDF vectorization—in identifying hate speech within online comments. Utilizing a balanced dataset, each model was meticulously assessed on its ability to classify comments as 'hateful' or 'non-hateful'. The evaluation metrics employed were precision, recall, F1-score, and overall accuracy. The model using SpaCy's word embeddings achieved an accuracy of 65%, with equal precision and recall for both classes. The Sklearn's TF-IDF vectorization model, however, demonstrated superior performance with an overall accuracy of 75% and an enhanced ability to correctly identify hateful comments, evidenced by a 77% recall rate. This suggests that the TF-IDF model is more adept at discerning nuanced expressions of hate speech. The study's findings highlight the critical role of vectorization methods in the field of automated content moderation and stress the importance of continued innovation and model adaptation to effectively manage the evolving nature of online hate speech.
first_indexed	2024-03-07T23:39:29Z
format	Article
id	doaj.art-8e31c687b54f4b01a5ba2ccc0fe5f983
institution	Directory Open Access Journal
issn	2580-8737 2623-064X
language	English
last_indexed	2024-03-07T23:39:29Z
publishDate	2024-01-01
publisher	Universitas Islam Raden Rahmat
record_format	Article
series	G-Tech
spelling	doaj.art-8e31c687b54f4b01a5ba2ccc0fe5f9832024-02-20T02:50:35ZengUniversitas Islam Raden RahmatG-Tech2580-87372623-064X2024-01-0181Comparative Analysis of NLP Techniques for Hate Speech Classification in Online CommunicationsGregorius Airlangga0Atma Jaya Catholic University of Indonesia, Indonesia This research aimed to compare the effectiveness of two Natural Language Processing (NLP) techniques—SpaCy's word embeddings and Sklearn's TF-IDF vectorization—in identifying hate speech within online comments. Utilizing a balanced dataset, each model was meticulously assessed on its ability to classify comments as 'hateful' or 'non-hateful'. The evaluation metrics employed were precision, recall, F1-score, and overall accuracy. The model using SpaCy's word embeddings achieved an accuracy of 65%, with equal precision and recall for both classes. The Sklearn's TF-IDF vectorization model, however, demonstrated superior performance with an overall accuracy of 75% and an enhanced ability to correctly identify hateful comments, evidenced by a 77% recall rate. This suggests that the TF-IDF model is more adept at discerning nuanced expressions of hate speech. The study's findings highlight the critical role of vectorization methods in the field of automated content moderation and stress the importance of continued innovation and model adaptation to effectively manage the evolving nature of online hate speech. https://ejournal.uniramalang.ac.id/index.php/g-tech/article/view/3959NLPHatefull Speech DetectionWord EmbeddingTFIDFMachine Learning
spellingShingle	Gregorius Airlangga Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications G-Tech NLP Hatefull Speech Detection Word Embedding TFIDF Machine Learning
title	Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications
title_full	Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications
title_fullStr	Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications
title_full_unstemmed	Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications
title_short	Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications
title_sort	comparative analysis of nlp techniques for hate speech classification in online communications
topic	NLP Hatefull Speech Detection Word Embedding TFIDF Machine Learning
url	https://ejournal.uniramalang.ac.id/index.php/g-tech/article/view/3959
work_keys_str_mv	AT gregoriusairlangga comparativeanalysisofnlptechniquesforhatespeechclassificationinonlinecommunications

Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications

Similar Items