On the use of collections with unreliably determined sex and age characteristics in model training for sex determination by traits of the standard craniometric program

The study is concerned with the feasibility of applying machine-learning methods to determine the sex from craniometric features when working with materials from archaeological excavations. A specific feature of such materials is subjectively estimated sex and age characteristics of individuals. The...

Full description

Bibliographic Details
Main Author: Shirobokov I.G.
Format: Article
Language:Russian
Published: Tyumen Scientific Centre SB RA 2023-09-01
Series:Вестник археологии, антропологии и этнографии
Subjects:
Online Access:http://ipdn.ru/_private/a62/129-138.pdf
_version_ 1797688212988100608
author Shirobokov I.G.
author_facet Shirobokov I.G.
author_sort Shirobokov I.G.
collection DOAJ
description The study is concerned with the feasibility of applying machine-learning methods to determine the sex from craniometric features when working with materials from archaeological excavations. A specific feature of such materials is subjectively estimated sex and age characteristics of individuals. The main object of the analysis was a sample measured by V.P. Alekseev and comprised of 258 crania (137 male and 121 female) characterising Russian population of the European part of Russia in the 17th–18th cc. As a test sample, a group of crania of the Russians with documented sex and age, registered within several collections of the Kunstkamera’s repository, also measured by V.P. Alekseev, was used. The series includes 89 male and 10 female skulls, which came to the museum from the Military Medical Academy in 1911–1914 by the effort of the Russian anatomist K.Z. Yatsuta. The models were trained, validated, and tested using four different methods, including discriminant analysis, logistic regression, random forest, and support vector machine. Thirty-three craniometric traits were included in the analysis, from which a group of five features with the highest differentiating ability (Nos. by Martin) — 1, 40, 43, 45, 75(1) — was chosen. When both sets of traits were used for the models commensurable performance indicators were obtained. According to the results of the cross-validation, in 85–88 % of cases, on average, all four models accurately predicted the sex estimates given by V.P. Alekseev. When the models were applied to the test sample, the proportion of accurate classifications did not change and stood at 87–88 %. At the same time, the machine-learning methods did not reveal any noticeable advantages in the level of the classification accuracy over the linear discriminant analysis. In general, the efficiency of the obtained models corresponds to the average value of the indicators calculated from the materials of 80 publications (86 %). It is likely that the crania, whose sex cannot be correctly classified neither by the models nor by visual assessment, constitute overlapping sets, which have some common morphological features assimilating them to individuals of the opposite sex. Application of the models to the skulls of the test sample, re-measured by the author, revealed some deterioration of the model performance indicators in all four cases. The decrease in the proportion of accurate classifications is caused mainly by discrepancies in the estimation of the nasal protrusion angle, as well as subjective errors in the size estimation under insufficient preservation of the crania and partial atrophy of the alveolar process.
first_indexed 2024-03-12T01:28:03Z
format Article
id doaj.art-1f308c922472409892067415e1ba9b88
institution Directory Open Access Journal
issn 1811-7465
2071-0437
language Russian
last_indexed 2024-03-12T01:28:03Z
publishDate 2023-09-01
publisher Tyumen Scientific Centre SB RA
record_format Article
series Вестник археологии, антропологии и этнографии
spelling doaj.art-1f308c922472409892067415e1ba9b882023-09-12T10:43:43ZrusTyumen Scientific Centre SB RAВестник археологии, антропологии и этнографии1811-74652071-04372023-09-013(62)12913810.20874/2071-0437-2023-62-3-11On the use of collections with unreliably determined sex and age characteristics in model training for sex determination by traits of the standard craniometric program Shirobokov I.G.0https://orcid.org/0000-0002-3555-7509Peter the Great Museum of Anthropology and EthnographyThe study is concerned with the feasibility of applying machine-learning methods to determine the sex from craniometric features when working with materials from archaeological excavations. A specific feature of such materials is subjectively estimated sex and age characteristics of individuals. The main object of the analysis was a sample measured by V.P. Alekseev and comprised of 258 crania (137 male and 121 female) characterising Russian population of the European part of Russia in the 17th–18th cc. As a test sample, a group of crania of the Russians with documented sex and age, registered within several collections of the Kunstkamera’s repository, also measured by V.P. Alekseev, was used. The series includes 89 male and 10 female skulls, which came to the museum from the Military Medical Academy in 1911–1914 by the effort of the Russian anatomist K.Z. Yatsuta. The models were trained, validated, and tested using four different methods, including discriminant analysis, logistic regression, random forest, and support vector machine. Thirty-three craniometric traits were included in the analysis, from which a group of five features with the highest differentiating ability (Nos. by Martin) — 1, 40, 43, 45, 75(1) — was chosen. When both sets of traits were used for the models commensurable performance indicators were obtained. According to the results of the cross-validation, in 85–88 % of cases, on average, all four models accurately predicted the sex estimates given by V.P. Alekseev. When the models were applied to the test sample, the proportion of accurate classifications did not change and stood at 87–88 %. At the same time, the machine-learning methods did not reveal any noticeable advantages in the level of the classification accuracy over the linear discriminant analysis. In general, the efficiency of the obtained models corresponds to the average value of the indicators calculated from the materials of 80 publications (86 %). It is likely that the crania, whose sex cannot be correctly classified neither by the models nor by visual assessment, constitute overlapping sets, which have some common morphological features assimilating them to individuals of the opposite sex. Application of the models to the skulls of the test sample, re-measured by the author, revealed some deterioration of the model performance indicators in all four cases. The decrease in the proportion of accurate classifications is caused mainly by discrepancies in the estimation of the nasal protrusion angle, as well as subjective errors in the size estimation under insufficient preservation of the crania and partial atrophy of the alveolar process.http://ipdn.ru/_private/a62/129-138.pdfsex estimationcraniometrics traitsdiscriminant analysissupport vector machinelogistic regressionrandom forestmachine learning methods
spellingShingle Shirobokov I.G.
On the use of collections with unreliably determined sex and age characteristics in model training for sex determination by traits of the standard craniometric program
Вестник археологии, антропологии и этнографии
sex estimation
craniometrics traits
discriminant analysis
support vector machine
logistic regression
random forest
machine learning methods
title On the use of collections with unreliably determined sex and age characteristics in model training for sex determination by traits of the standard craniometric program
title_full On the use of collections with unreliably determined sex and age characteristics in model training for sex determination by traits of the standard craniometric program
title_fullStr On the use of collections with unreliably determined sex and age characteristics in model training for sex determination by traits of the standard craniometric program
title_full_unstemmed On the use of collections with unreliably determined sex and age characteristics in model training for sex determination by traits of the standard craniometric program
title_short On the use of collections with unreliably determined sex and age characteristics in model training for sex determination by traits of the standard craniometric program
title_sort on the use of collections with unreliably determined sex and age characteristics in model training for sex determination by traits of the standard craniometric program
topic sex estimation
craniometrics traits
discriminant analysis
support vector machine
logistic regression
random forest
machine learning methods
url http://ipdn.ru/_private/a62/129-138.pdf
work_keys_str_mv AT shirobokovig ontheuseofcollectionswithunreliablydeterminedsexandagecharacteristicsinmodeltrainingforsexdeterminationbytraitsofthestandardcraniometricprogram