Comparison and benchmark of name-to-gender inference services
The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with inform...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2018-07-01
|
Series: | PeerJ Computer Science |
Subjects: | |
Online Access: | https://peerj.com/articles/cs-156.pdf |
_version_ | 1818565951603343360 |
---|---|
author | Lucía Santamaría Helena Mihaljević |
author_facet | Lucía Santamaría Helena Mihaljević |
author_sort | Lucía Santamaría |
collection | DOAJ |
description | The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with information from social media profiles, culture-specific rules, and insights from sociolinguistics. We compare and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names. The compiled names are analyzed and characterized according to their geographical and cultural origin. We define a series of performance metrics to quantify various types of classification errors, and define a parameter tuning procedure to search for optimal values of the services’ free parameters. Finally, we perform benchmarks of all services under study regarding several scenarios where a particular metric is to be optimized. |
first_indexed | 2024-12-14T01:47:27Z |
format | Article |
id | doaj.art-f9e74c450776437b886d4d3a4978dbba |
institution | Directory Open Access Journal |
issn | 2376-5992 |
language | English |
last_indexed | 2024-12-14T01:47:27Z |
publishDate | 2018-07-01 |
publisher | PeerJ Inc. |
record_format | Article |
series | PeerJ Computer Science |
spelling | doaj.art-f9e74c450776437b886d4d3a4978dbba2022-12-21T23:21:30ZengPeerJ Inc.PeerJ Computer Science2376-59922018-07-014e15610.7717/peerj-cs.156Comparison and benchmark of name-to-gender inference servicesLucía Santamaría0Helena Mihaljević1Amazon Development Center, Berlin, GermanyUniversity of Applied Sciences, Berlin, GermanyThe increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with information from social media profiles, culture-specific rules, and insights from sociolinguistics. We compare and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names. The compiled names are analyzed and characterized according to their geographical and cultural origin. We define a series of performance metrics to quantify various types of classification errors, and define a parameter tuning procedure to search for optimal values of the services’ free parameters. Finally, we perform benchmarks of all services under study regarding several scenarios where a particular metric is to be optimized.https://peerj.com/articles/cs-156.pdfName-based gender inferenceClassification algorithmsPerformance evaluationGender analysisScientometricsBibliometrics |
spellingShingle | Lucía Santamaría Helena Mihaljević Comparison and benchmark of name-to-gender inference services PeerJ Computer Science Name-based gender inference Classification algorithms Performance evaluation Gender analysis Scientometrics Bibliometrics |
title | Comparison and benchmark of name-to-gender inference services |
title_full | Comparison and benchmark of name-to-gender inference services |
title_fullStr | Comparison and benchmark of name-to-gender inference services |
title_full_unstemmed | Comparison and benchmark of name-to-gender inference services |
title_short | Comparison and benchmark of name-to-gender inference services |
title_sort | comparison and benchmark of name to gender inference services |
topic | Name-based gender inference Classification algorithms Performance evaluation Gender analysis Scientometrics Bibliometrics |
url | https://peerj.com/articles/cs-156.pdf |
work_keys_str_mv | AT luciasantamaria comparisonandbenchmarkofnametogenderinferenceservices AT helenamihaljevic comparisonandbenchmarkofnametogenderinferenceservices |