Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project

Achieving accurate STR genotyping by using next-generation sequencing data has been challenging. To provide the forensic genetics community with a reliable open-access STR database, we conducted a comprehensive genotyping analysis of a set of STRs of broad forensic interest obtained from 1000 Genome...

Full description

Bibliographic Details
Main Authors: Tamara Soledad Frontanilla, Guilherme Valle-Silva, Jesus Ayala, Celso Teixeira Mendes-Junior
Format: Article
Language:English
Published: MDPI AG 2022-11-01
Series:Genes
Subjects:
Online Access:https://www.mdpi.com/2073-4425/13/12/2205
_version_ 1797457910991683584
author Tamara Soledad Frontanilla
Guilherme Valle-Silva
Jesus Ayala
Celso Teixeira Mendes-Junior
author_facet Tamara Soledad Frontanilla
Guilherme Valle-Silva
Jesus Ayala
Celso Teixeira Mendes-Junior
author_sort Tamara Soledad Frontanilla
collection DOAJ
description Achieving accurate STR genotyping by using next-generation sequencing data has been challenging. To provide the forensic genetics community with a reliable open-access STR database, we conducted a comprehensive genotyping analysis of a set of STRs of broad forensic interest obtained from 1000 Genome populations. We analyzed 22 STR markers using files of the high-coverage dataset of Phase 3 of the 1000 Genomes Project. We used HipSTR to call genotypes from 2504 samples obtained from 26 populations. We were not able to detect the D21S11 marker. The Hardy-Weinberg equilibrium analysis coupled with a comprehensive analysis of allele frequencies revealed that HipSTR was not able to identify longer alleles, which resulted in heterozygote deficiency. Nevertheless, AMOVA, a clustering analysis that uses STRUCTURE, and a Principal Coordinates Analysis showed a clear-cut separation between the four major ancestries sampled by the 1000 Genomes Consortium. Except for larger Penta D and Penta E alleles, and two very small Penta D alleles (2.2 and 3.2) usually observed in African populations, our analyses revealed that allele frequencies and genotypes offered as an open-access database are consistent and reliable.
first_indexed 2024-03-09T16:30:33Z
format Article
id doaj.art-8af6b1a96f2444ef8a2a4a2e98ed1fdd
institution Directory Open Access Journal
issn 2073-4425
language English
last_indexed 2024-03-09T16:30:33Z
publishDate 2022-11-01
publisher MDPI AG
record_format Article
series Genes
spelling doaj.art-8af6b1a96f2444ef8a2a4a2e98ed1fdd2023-11-24T15:02:45ZengMDPI AGGenes2073-44252022-11-011312220510.3390/genes13122205Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes ProjectTamara Soledad Frontanilla0Guilherme Valle-Silva1Jesus Ayala2Celso Teixeira Mendes-Junior3Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14049-900, SP, BrazilDepartamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14040-901, SP, BrazilFacultad de Ingeniería Informática, Universidad de la Integración de las Americas, Asunción 00120-6, ParaguayDepartamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14040-901, SP, BrazilAchieving accurate STR genotyping by using next-generation sequencing data has been challenging. To provide the forensic genetics community with a reliable open-access STR database, we conducted a comprehensive genotyping analysis of a set of STRs of broad forensic interest obtained from 1000 Genome populations. We analyzed 22 STR markers using files of the high-coverage dataset of Phase 3 of the 1000 Genomes Project. We used HipSTR to call genotypes from 2504 samples obtained from 26 populations. We were not able to detect the D21S11 marker. The Hardy-Weinberg equilibrium analysis coupled with a comprehensive analysis of allele frequencies revealed that HipSTR was not able to identify longer alleles, which resulted in heterozygote deficiency. Nevertheless, AMOVA, a clustering analysis that uses STRUCTURE, and a Principal Coordinates Analysis showed a clear-cut separation between the four major ancestries sampled by the 1000 Genomes Consortium. Except for larger Penta D and Penta E alleles, and two very small Penta D alleles (2.2 and 3.2) usually observed in African populations, our analyses revealed that allele frequencies and genotypes offered as an open-access database are consistent and reliable.https://www.mdpi.com/2073-4425/13/12/2205HipSTRallele frequenciesforensic geneticsworldwide populationbioinformatics
spellingShingle Tamara Soledad Frontanilla
Guilherme Valle-Silva
Jesus Ayala
Celso Teixeira Mendes-Junior
Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project
Genes
HipSTR
allele frequencies
forensic genetics
worldwide population
bioinformatics
title Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project
title_full Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project
title_fullStr Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project
title_full_unstemmed Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project
title_short Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project
title_sort open access worldwide population str database constructed using high coverage massively parallel sequencing data obtained from the 1000 genomes project
topic HipSTR
allele frequencies
forensic genetics
worldwide population
bioinformatics
url https://www.mdpi.com/2073-4425/13/12/2205
work_keys_str_mv AT tamarasoledadfrontanilla openaccessworldwidepopulationstrdatabaseconstructedusinghighcoveragemassivelyparallelsequencingdataobtainedfromthe1000genomesproject
AT guilhermevallesilva openaccessworldwidepopulationstrdatabaseconstructedusinghighcoveragemassivelyparallelsequencingdataobtainedfromthe1000genomesproject
AT jesusayala openaccessworldwidepopulationstrdatabaseconstructedusinghighcoveragemassivelyparallelsequencingdataobtainedfromthe1000genomesproject
AT celsoteixeiramendesjunior openaccessworldwidepopulationstrdatabaseconstructedusinghighcoveragemassivelyparallelsequencingdataobtainedfromthe1000genomesproject