Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text

Nowadays, children have access to Internet on a regular basis. Just like the real world, the Internet has many unsafe locations where kids may be exposed to inappropriate content in the form of obscene, aggressive, erotic or rude comments. In this work, we address the problem of detecting erotic/sex...

Full description

Bibliographic Details
Main Authors: Gonzalo Molpeceres Barrientos, Rocío Alaiz-Rodríguez, Víctor González-Castro, Andrew C. Parnell
Format: Article
Language:English
Published: Springer 2020-06-01
Series:International Journal of Computational Intelligence Systems
Subjects:
Online Access:https://www.atlantis-press.com/article/125941254/view
_version_ 1811301786691043328
author Gonzalo Molpeceres Barrientos
Rocío Alaiz-Rodríguez
Víctor González-Castro
Andrew C. Parnell
author_facet Gonzalo Molpeceres Barrientos
Rocío Alaiz-Rodríguez
Víctor González-Castro
Andrew C. Parnell
author_sort Gonzalo Molpeceres Barrientos
collection DOAJ
description Nowadays, children have access to Internet on a regular basis. Just like the real world, the Internet has many unsafe locations where kids may be exposed to inappropriate content in the form of obscene, aggressive, erotic or rude comments. In this work, we address the problem of detecting erotic/sexual content on text documents using Natural Language Processing (NLP) techniques. Following an approach based on Machine Learning techniques, we have assessed twelve models resulting from the combination of three text encoders (Bag of Words, Term Frequency-Inverse Document Frequency and Word2vec) together with four classifiers (Support Vector Machines (SVMs), Logistic Regression, k-Nearest Neighbors and Random Forests). We evaluated these alternatives on a new created dataset extracted from public data on the Reddit Website. The best performance result was achieved by the combination of the text encoder TF-IDF and the SVM classifier with linear kernel with an accuracy of 0.97 and F-score 0.96 (precision 0.96/recall 0.95). This study demonstrates that it is possible to detect erotic content on text documents and therefore, develop filters for minors or according to user's preferences.
first_indexed 2024-04-13T07:15:09Z
format Article
id doaj.art-f4f81192efb544148770fab3ec182ed8
institution Directory Open Access Journal
issn 1875-6883
language English
last_indexed 2024-04-13T07:15:09Z
publishDate 2020-06-01
publisher Springer
record_format Article
series International Journal of Computational Intelligence Systems
spelling doaj.art-f4f81192efb544148770fab3ec182ed82022-12-22T02:56:46ZengSpringerInternational Journal of Computational Intelligence Systems1875-68832020-06-0113110.2991/ijcis.d.200519.003Machine Learning Techniques for the Detection of Inappropriate Erotic Content in TextGonzalo Molpeceres BarrientosRocío Alaiz-RodríguezVíctor González-CastroAndrew C. ParnellNowadays, children have access to Internet on a regular basis. Just like the real world, the Internet has many unsafe locations where kids may be exposed to inappropriate content in the form of obscene, aggressive, erotic or rude comments. In this work, we address the problem of detecting erotic/sexual content on text documents using Natural Language Processing (NLP) techniques. Following an approach based on Machine Learning techniques, we have assessed twelve models resulting from the combination of three text encoders (Bag of Words, Term Frequency-Inverse Document Frequency and Word2vec) together with four classifiers (Support Vector Machines (SVMs), Logistic Regression, k-Nearest Neighbors and Random Forests). We evaluated these alternatives on a new created dataset extracted from public data on the Reddit Website. The best performance result was achieved by the combination of the text encoder TF-IDF and the SVM classifier with linear kernel with an accuracy of 0.97 and F-score 0.96 (precision 0.96/recall 0.95). This study demonstrates that it is possible to detect erotic content on text documents and therefore, develop filters for minors or according to user's preferences.https://www.atlantis-press.com/article/125941254/viewInappropriate contentMachine learningText classificationNatural language processingText encoders
spellingShingle Gonzalo Molpeceres Barrientos
Rocío Alaiz-Rodríguez
Víctor González-Castro
Andrew C. Parnell
Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text
International Journal of Computational Intelligence Systems
Inappropriate content
Machine learning
Text classification
Natural language processing
Text encoders
title Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text
title_full Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text
title_fullStr Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text
title_full_unstemmed Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text
title_short Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text
title_sort machine learning techniques for the detection of inappropriate erotic content in text
topic Inappropriate content
Machine learning
Text classification
Natural language processing
Text encoders
url https://www.atlantis-press.com/article/125941254/view
work_keys_str_mv AT gonzalomolpeceresbarrientos machinelearningtechniquesforthedetectionofinappropriateeroticcontentintext
AT rocioalaizrodriguez machinelearningtechniquesforthedetectionofinappropriateeroticcontentintext
AT victorgonzalezcastro machinelearningtechniquesforthedetectionofinappropriateeroticcontentintext
AT andrewcparnell machinelearningtechniquesforthedetectionofinappropriateeroticcontentintext