Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers

Abstract The rise of social networks has allowed misogynistic, xenophobic, and homophobic people to spread their hate-speech to intimidate individuals or groups because of their gender, ethnicity or sexual orientation. The consequences of hate-speech are devastating, causing severe depression and ev...

Full description

Bibliographic Details
Main Authors:	José Antonio García-Díaz, Salud María Jiménez-Zafra, Miguel Angel García-Cumbreras, Rafael Valencia-García
Format:	Article
Language:	English
Published:	Springer 2022-02-01
Series:	Complex & Intelligent Systems
Subjects:	Hate-speech Feature engineering Knowledge integration Text classification Natural language processing
Online Access:	https://doi.org/10.1007/s40747-022-00693-x

_version_	1797806467121676288
author	José Antonio García-Díaz Salud María Jiménez-Zafra Miguel Angel García-Cumbreras Rafael Valencia-García
author_facet	José Antonio García-Díaz Salud María Jiménez-Zafra Miguel Angel García-Cumbreras Rafael Valencia-García
author_sort	José Antonio García-Díaz
collection	DOAJ
description	Abstract The rise of social networks has allowed misogynistic, xenophobic, and homophobic people to spread their hate-speech to intimidate individuals or groups because of their gender, ethnicity or sexual orientation. The consequences of hate-speech are devastating, causing severe depression and even leading people to commit suicide. Hate-speech identification is challenging as the large amount of daily publications makes it impossible to review every comment by hand. Moreover, hate-speech is also spread by hoaxes that requires language and context understanding. With the aim of reducing the number of comments that should be reviewed by experts, or even for the development of autonomous systems, the automatic identification of hate-speech has gained academic relevance. However, the reliability of automatic approaches is still limited specifically in languages other than English, in which some of the state-of-the-art techniques have not been analyzed in detail. In this work, we examine which features are most effective in identifying hate-speech in Spanish and how these features can be combined to develop more accurate systems. In addition, we characterize the language present in each type of hate-speech by means of explainable linguistic features and compare our results with state-of-the-art approaches. Our research indicates that combining linguistic features and transformers by means of knowledge integration outperforms current solutions regarding hate-speech identification in Spanish.
first_indexed	2024-03-13T06:07:45Z
format	Article
id	doaj.art-cd202626b0104926b5c64980062f7ea8
institution	Directory Open Access Journal
issn	2199-4536 2198-6053
language	English
last_indexed	2024-03-13T06:07:45Z
publishDate	2022-02-01
publisher	Springer
record_format	Article
series	Complex & Intelligent Systems
spelling	doaj.art-cd202626b0104926b5c64980062f7ea82023-06-11T11:29:40ZengSpringerComplex & Intelligent Systems2199-45362198-60532022-02-01932893291410.1007/s40747-022-00693-xEvaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformersJosé Antonio García-Díaz0Salud María Jiménez-Zafra1Miguel Angel García-Cumbreras2Rafael Valencia-García3Facultad de Informática, Universidad de MurciaComputer Science Department, SINAI, CEATIC, Universidad de JaénComputer Science Department, SINAI, CEATIC, Universidad de JaénFacultad de Informática, Universidad de MurciaAbstract The rise of social networks has allowed misogynistic, xenophobic, and homophobic people to spread their hate-speech to intimidate individuals or groups because of their gender, ethnicity or sexual orientation. The consequences of hate-speech are devastating, causing severe depression and even leading people to commit suicide. Hate-speech identification is challenging as the large amount of daily publications makes it impossible to review every comment by hand. Moreover, hate-speech is also spread by hoaxes that requires language and context understanding. With the aim of reducing the number of comments that should be reviewed by experts, or even for the development of autonomous systems, the automatic identification of hate-speech has gained academic relevance. However, the reliability of automatic approaches is still limited specifically in languages other than English, in which some of the state-of-the-art techniques have not been analyzed in detail. In this work, we examine which features are most effective in identifying hate-speech in Spanish and how these features can be combined to develop more accurate systems. In addition, we characterize the language present in each type of hate-speech by means of explainable linguistic features and compare our results with state-of-the-art approaches. Our research indicates that combining linguistic features and transformers by means of knowledge integration outperforms current solutions regarding hate-speech identification in Spanish.https://doi.org/10.1007/s40747-022-00693-xHate-speechFeature engineeringKnowledge integrationText classificationNatural language processing
spellingShingle	José Antonio García-Díaz Salud María Jiménez-Zafra Miguel Angel García-Cumbreras Rafael Valencia-García Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers Complex & Intelligent Systems Hate-speech Feature engineering Knowledge integration Text classification Natural language processing
title	Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers
title_full	Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers
title_fullStr	Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers
title_full_unstemmed	Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers
title_short	Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers
title_sort	evaluating feature combination strategies for hate speech detection in spanish using linguistic features and transformers
topic	Hate-speech Feature engineering Knowledge integration Text classification Natural language processing
url	https://doi.org/10.1007/s40747-022-00693-x
work_keys_str_mv	AT joseantoniogarciadiaz evaluatingfeaturecombinationstrategiesforhatespeechdetectioninspanishusinglinguisticfeaturesandtransformers AT saludmariajimenezzafra evaluatingfeaturecombinationstrategiesforhatespeechdetectioninspanishusinglinguisticfeaturesandtransformers AT miguelangelgarciacumbreras evaluatingfeaturecombinationstrategiesforhatespeechdetectioninspanishusinglinguisticfeaturesandtransformers AT rafaelvalenciagarcia evaluatingfeaturecombinationstrategiesforhatespeechdetectioninspanishusinglinguisticfeaturesandtransformers

Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers

Similar Items