Decision Algorithm for the Automatic Determination of the Use of Non-Inclusive Terms in Academic Texts

The use of inclusive language, among many other gender equality initiatives in society, has garnered great attention in recent years. Gender equality offices in universities and public administration cannot cope with the task of manually checking the use of non-inclusive language in the documentatio...

Full description

Bibliographic Details
Main Authors: Pedro Orgeira-Crespo, Carla Míguez-Álvarez, Miguel Cuevas-Alonso, María Isabel Doval-Ruiz
Format: Article
Language:English
Published: MDPI AG 2020-08-01
Series:Publications
Subjects:
Online Access:https://www.mdpi.com/2304-6775/8/3/41
_version_ 1797559920903585792
author Pedro Orgeira-Crespo
Carla Míguez-Álvarez
Miguel Cuevas-Alonso
María Isabel Doval-Ruiz
author_facet Pedro Orgeira-Crespo
Carla Míguez-Álvarez
Miguel Cuevas-Alonso
María Isabel Doval-Ruiz
author_sort Pedro Orgeira-Crespo
collection DOAJ
description The use of inclusive language, among many other gender equality initiatives in society, has garnered great attention in recent years. Gender equality offices in universities and public administration cannot cope with the task of manually checking the use of non-inclusive language in the documentation that those institutions generate. In this research, an automated solution for the detection of non-inclusive uses of the Spanish language in doctoral theses generated in Spanish universities is introduced using machine learning techniques. A large dataset has been used to train, validate, and analyze the use of inclusive language; the result is an algorithm that detects, within any Spanish text document, non-inclusive uses of the language with error, false positive, and false negative ratios slightly over 10%, and precision, recall, and F-measure percentages over 86%. Results also show the evolution with time of the ratio of non-inclusive usages per document, having a pronounced reduction in the last years under study.
first_indexed 2024-03-10T17:53:07Z
format Article
id doaj.art-a0b823d44ccc43cca9e1397a0d3357c3
institution Directory Open Access Journal
issn 2304-6775
language English
last_indexed 2024-03-10T17:53:07Z
publishDate 2020-08-01
publisher MDPI AG
record_format Article
series Publications
spelling doaj.art-a0b823d44ccc43cca9e1397a0d3357c32023-11-20T09:18:36ZengMDPI AGPublications2304-67752020-08-01834110.3390/publications8030041Decision Algorithm for the Automatic Determination of the Use of Non-Inclusive Terms in Academic TextsPedro Orgeira-Crespo0Carla Míguez-Álvarez1Miguel Cuevas-Alonso2María Isabel Doval-Ruiz3Aerospace Area, Department of Mechanical Engineering, Heat Engines and Machines, and Fluids, Aerospace Engineering School, University of Vigo, Campus Orense, 32004 Orense, SpainLanguage Variation and Textual Categorization (LVTC), Philology and Translation School, University of Vigo, 36310 Vigo, SpainLanguage Variation and Textual Categorization (LVTC), Philology and Translation School, University of Vigo, 36310 Vigo, SpainFaculty of Educational Sciences, University of Vigo, Campus Lagoas Marcosende, 36310 Vigo, SpainThe use of inclusive language, among many other gender equality initiatives in society, has garnered great attention in recent years. Gender equality offices in universities and public administration cannot cope with the task of manually checking the use of non-inclusive language in the documentation that those institutions generate. In this research, an automated solution for the detection of non-inclusive uses of the Spanish language in doctoral theses generated in Spanish universities is introduced using machine learning techniques. A large dataset has been used to train, validate, and analyze the use of inclusive language; the result is an algorithm that detects, within any Spanish text document, non-inclusive uses of the language with error, false positive, and false negative ratios slightly over 10%, and precision, recall, and F-measure percentages over 86%. Results also show the evolution with time of the ratio of non-inclusive usages per document, having a pronounced reduction in the last years under study.https://www.mdpi.com/2304-6775/8/3/41inclusive languageSpanish languagenatural language processingclassification algorithmmachine learning
spellingShingle Pedro Orgeira-Crespo
Carla Míguez-Álvarez
Miguel Cuevas-Alonso
María Isabel Doval-Ruiz
Decision Algorithm for the Automatic Determination of the Use of Non-Inclusive Terms in Academic Texts
Publications
inclusive language
Spanish language
natural language processing
classification algorithm
machine learning
title Decision Algorithm for the Automatic Determination of the Use of Non-Inclusive Terms in Academic Texts
title_full Decision Algorithm for the Automatic Determination of the Use of Non-Inclusive Terms in Academic Texts
title_fullStr Decision Algorithm for the Automatic Determination of the Use of Non-Inclusive Terms in Academic Texts
title_full_unstemmed Decision Algorithm for the Automatic Determination of the Use of Non-Inclusive Terms in Academic Texts
title_short Decision Algorithm for the Automatic Determination of the Use of Non-Inclusive Terms in Academic Texts
title_sort decision algorithm for the automatic determination of the use of non inclusive terms in academic texts
topic inclusive language
Spanish language
natural language processing
classification algorithm
machine learning
url https://www.mdpi.com/2304-6775/8/3/41
work_keys_str_mv AT pedroorgeiracrespo decisionalgorithmfortheautomaticdeterminationoftheuseofnoninclusivetermsinacademictexts
AT carlamiguezalvarez decisionalgorithmfortheautomaticdeterminationoftheuseofnoninclusivetermsinacademictexts
AT miguelcuevasalonso decisionalgorithmfortheautomaticdeterminationoftheuseofnoninclusivetermsinacademictexts
AT mariaisabeldovalruiz decisionalgorithmfortheautomaticdeterminationoftheuseofnoninclusivetermsinacademictexts