Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project
Achieving accurate STR genotyping by using next-generation sequencing data has been challenging. To provide the forensic genetics community with a reliable open-access STR database, we conducted a comprehensive genotyping analysis of a set of STRs of broad forensic interest obtained from 1000 Genome...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-11-01
|
Series: | Genes |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4425/13/12/2205 |
_version_ | 1797457910991683584 |
---|---|
author | Tamara Soledad Frontanilla Guilherme Valle-Silva Jesus Ayala Celso Teixeira Mendes-Junior |
author_facet | Tamara Soledad Frontanilla Guilherme Valle-Silva Jesus Ayala Celso Teixeira Mendes-Junior |
author_sort | Tamara Soledad Frontanilla |
collection | DOAJ |
description | Achieving accurate STR genotyping by using next-generation sequencing data has been challenging. To provide the forensic genetics community with a reliable open-access STR database, we conducted a comprehensive genotyping analysis of a set of STRs of broad forensic interest obtained from 1000 Genome populations. We analyzed 22 STR markers using files of the high-coverage dataset of Phase 3 of the 1000 Genomes Project. We used HipSTR to call genotypes from 2504 samples obtained from 26 populations. We were not able to detect the D21S11 marker. The Hardy-Weinberg equilibrium analysis coupled with a comprehensive analysis of allele frequencies revealed that HipSTR was not able to identify longer alleles, which resulted in heterozygote deficiency. Nevertheless, AMOVA, a clustering analysis that uses STRUCTURE, and a Principal Coordinates Analysis showed a clear-cut separation between the four major ancestries sampled by the 1000 Genomes Consortium. Except for larger Penta D and Penta E alleles, and two very small Penta D alleles (2.2 and 3.2) usually observed in African populations, our analyses revealed that allele frequencies and genotypes offered as an open-access database are consistent and reliable. |
first_indexed | 2024-03-09T16:30:33Z |
format | Article |
id | doaj.art-8af6b1a96f2444ef8a2a4a2e98ed1fdd |
institution | Directory Open Access Journal |
issn | 2073-4425 |
language | English |
last_indexed | 2024-03-09T16:30:33Z |
publishDate | 2022-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Genes |
spelling | doaj.art-8af6b1a96f2444ef8a2a4a2e98ed1fdd2023-11-24T15:02:45ZengMDPI AGGenes2073-44252022-11-011312220510.3390/genes13122205Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes ProjectTamara Soledad Frontanilla0Guilherme Valle-Silva1Jesus Ayala2Celso Teixeira Mendes-Junior3Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14049-900, SP, BrazilDepartamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14040-901, SP, BrazilFacultad de Ingeniería Informática, Universidad de la Integración de las Americas, Asunción 00120-6, ParaguayDepartamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14040-901, SP, BrazilAchieving accurate STR genotyping by using next-generation sequencing data has been challenging. To provide the forensic genetics community with a reliable open-access STR database, we conducted a comprehensive genotyping analysis of a set of STRs of broad forensic interest obtained from 1000 Genome populations. We analyzed 22 STR markers using files of the high-coverage dataset of Phase 3 of the 1000 Genomes Project. We used HipSTR to call genotypes from 2504 samples obtained from 26 populations. We were not able to detect the D21S11 marker. The Hardy-Weinberg equilibrium analysis coupled with a comprehensive analysis of allele frequencies revealed that HipSTR was not able to identify longer alleles, which resulted in heterozygote deficiency. Nevertheless, AMOVA, a clustering analysis that uses STRUCTURE, and a Principal Coordinates Analysis showed a clear-cut separation between the four major ancestries sampled by the 1000 Genomes Consortium. Except for larger Penta D and Penta E alleles, and two very small Penta D alleles (2.2 and 3.2) usually observed in African populations, our analyses revealed that allele frequencies and genotypes offered as an open-access database are consistent and reliable.https://www.mdpi.com/2073-4425/13/12/2205HipSTRallele frequenciesforensic geneticsworldwide populationbioinformatics |
spellingShingle | Tamara Soledad Frontanilla Guilherme Valle-Silva Jesus Ayala Celso Teixeira Mendes-Junior Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project Genes HipSTR allele frequencies forensic genetics worldwide population bioinformatics |
title | Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project |
title_full | Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project |
title_fullStr | Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project |
title_full_unstemmed | Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project |
title_short | Open-Access Worldwide Population STR Database Constructed Using High-Coverage Massively Parallel Sequencing Data Obtained from the 1000 Genomes Project |
title_sort | open access worldwide population str database constructed using high coverage massively parallel sequencing data obtained from the 1000 genomes project |
topic | HipSTR allele frequencies forensic genetics worldwide population bioinformatics |
url | https://www.mdpi.com/2073-4425/13/12/2205 |
work_keys_str_mv | AT tamarasoledadfrontanilla openaccessworldwidepopulationstrdatabaseconstructedusinghighcoveragemassivelyparallelsequencingdataobtainedfromthe1000genomesproject AT guilhermevallesilva openaccessworldwidepopulationstrdatabaseconstructedusinghighcoveragemassivelyparallelsequencingdataobtainedfromthe1000genomesproject AT jesusayala openaccessworldwidepopulationstrdatabaseconstructedusinghighcoveragemassivelyparallelsequencingdataobtainedfromthe1000genomesproject AT celsoteixeiramendesjunior openaccessworldwidepopulationstrdatabaseconstructedusinghighcoveragemassivelyparallelsequencingdataobtainedfromthe1000genomesproject |