Comparison and benchmark of name-to-gender inference services

The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with inform...

Full description

Bibliographic Details
Main Authors: Lucía Santamaría, Helena Mihaljević
Format: Article
Language:English
Published: PeerJ Inc. 2018-07-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-156.pdf
_version_ 1818565951603343360
author Lucía Santamaría
Helena Mihaljević
author_facet Lucía Santamaría
Helena Mihaljević
author_sort Lucía Santamaría
collection DOAJ
description The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with information from social media profiles, culture-specific rules, and insights from sociolinguistics. We compare and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names. The compiled names are analyzed and characterized according to their geographical and cultural origin. We define a series of performance metrics to quantify various types of classification errors, and define a parameter tuning procedure to search for optimal values of the services’ free parameters. Finally, we perform benchmarks of all services under study regarding several scenarios where a particular metric is to be optimized.
first_indexed 2024-12-14T01:47:27Z
format Article
id doaj.art-f9e74c450776437b886d4d3a4978dbba
institution Directory Open Access Journal
issn 2376-5992
language English
last_indexed 2024-12-14T01:47:27Z
publishDate 2018-07-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj.art-f9e74c450776437b886d4d3a4978dbba2022-12-21T23:21:30ZengPeerJ Inc.PeerJ Computer Science2376-59922018-07-014e15610.7717/peerj-cs.156Comparison and benchmark of name-to-gender inference servicesLucía Santamaría0Helena Mihaljević1Amazon Development Center, Berlin, GermanyUniversity of Applied Sciences, Berlin, GermanyThe increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with information from social media profiles, culture-specific rules, and insights from sociolinguistics. We compare and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names. The compiled names are analyzed and characterized according to their geographical and cultural origin. We define a series of performance metrics to quantify various types of classification errors, and define a parameter tuning procedure to search for optimal values of the services’ free parameters. Finally, we perform benchmarks of all services under study regarding several scenarios where a particular metric is to be optimized.https://peerj.com/articles/cs-156.pdfName-based gender inferenceClassification algorithmsPerformance evaluationGender analysisScientometricsBibliometrics
spellingShingle Lucía Santamaría
Helena Mihaljević
Comparison and benchmark of name-to-gender inference services
PeerJ Computer Science
Name-based gender inference
Classification algorithms
Performance evaluation
Gender analysis
Scientometrics
Bibliometrics
title Comparison and benchmark of name-to-gender inference services
title_full Comparison and benchmark of name-to-gender inference services
title_fullStr Comparison and benchmark of name-to-gender inference services
title_full_unstemmed Comparison and benchmark of name-to-gender inference services
title_short Comparison and benchmark of name-to-gender inference services
title_sort comparison and benchmark of name to gender inference services
topic Name-based gender inference
Classification algorithms
Performance evaluation
Gender analysis
Scientometrics
Bibliometrics
url https://peerj.com/articles/cs-156.pdf
work_keys_str_mv AT luciasantamaria comparisonandbenchmarkofnametogenderinferenceservices
AT helenamihaljevic comparisonandbenchmarkofnametogenderinferenceservices