A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research
Two major forces have contributed to the fast growth of human genetic data. One from medical research supported by governments and academic institutes; the other from direct-to-consumer (DTC) sequencing companies. While the former benefits from meticulously designed sequencing standards and quality...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2021-01-01
|
Series: | Computational and Structural Biotechnology Journal |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2001037021002786 |
_version_ | 1818979566484455424 |
---|---|
author | Chang Lu Bastian Greshake Tzovaras Julian Gough |
author_facet | Chang Lu Bastian Greshake Tzovaras Julian Gough |
author_sort | Chang Lu |
collection | DOAJ |
description | Two major forces have contributed to the fast growth of human genetic data. One from medical research supported by governments and academic institutes; the other from direct-to-consumer (DTC) sequencing companies. While the former benefits from meticulously designed sequencing standards and quality control procedures, the latter comes in various formats and sequencing methods which are subject to changes over time and the particular needs of different companies. Thanks to the general public who shared their DNA data without constraint, here we provide a review for over 7000 genomes made public between 2011 and 2020, and produced by over six DTC sequencing companies. An open source tool-kit to systematically parse, quality check and filter genome files and statistically problematic alleles is provided to prepare consumer DNA datasets for research. The GenomePrep output is available in two common DNA datafile formats to enable further analysis with other tools. We also provide for download the combined output for all OpenSNP array genomes processed in this paper in a single data freeze file. |
first_indexed | 2024-12-20T17:01:34Z |
format | Article |
id | doaj.art-b30848b348304fe4a2776b0b54fa6402 |
institution | Directory Open Access Journal |
issn | 2001-0370 |
language | English |
last_indexed | 2024-12-20T17:01:34Z |
publishDate | 2021-01-01 |
publisher | Elsevier |
record_format | Article |
series | Computational and Structural Biotechnology Journal |
spelling | doaj.art-b30848b348304fe4a2776b0b54fa64022022-12-21T19:32:31ZengElsevierComputational and Structural Biotechnology Journal2001-03702021-01-011937473754A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for researchChang Lu0Bastian Greshake Tzovaras1Julian Gough2MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, UK; Corresponding author.Center for Research and Interdisciplinarity (CRI), Universite de Paris, INSERM U1284, Paris, FranceMRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, UKTwo major forces have contributed to the fast growth of human genetic data. One from medical research supported by governments and academic institutes; the other from direct-to-consumer (DTC) sequencing companies. While the former benefits from meticulously designed sequencing standards and quality control procedures, the latter comes in various formats and sequencing methods which are subject to changes over time and the particular needs of different companies. Thanks to the general public who shared their DNA data without constraint, here we provide a review for over 7000 genomes made public between 2011 and 2020, and produced by over six DTC sequencing companies. An open source tool-kit to systematically parse, quality check and filter genome files and statistically problematic alleles is provided to prepare consumer DNA datasets for research. The GenomePrep output is available in two common DNA datafile formats to enable further analysis with other tools. We also provide for download the combined output for all OpenSNP array genomes processed in this paper in a single data freeze file.http://www.sciencedirect.com/science/article/pii/S2001037021002786GenotypingDirect-to-consumer sequencingOpen genomePersonal genomeSNP arrays |
spellingShingle | Chang Lu Bastian Greshake Tzovaras Julian Gough A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research Computational and Structural Biotechnology Journal Genotyping Direct-to-consumer sequencing Open genome Personal genome SNP arrays |
title | A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research |
title_full | A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research |
title_fullStr | A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research |
title_full_unstemmed | A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research |
title_short | A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research |
title_sort | survey of direct to consumer genotype data and quality control tool genomeprep for research |
topic | Genotyping Direct-to-consumer sequencing Open genome Personal genome SNP arrays |
url | http://www.sciencedirect.com/science/article/pii/S2001037021002786 |
work_keys_str_mv | AT changlu asurveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch AT bastiangreshaketzovaras asurveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch AT juliangough asurveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch AT changlu surveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch AT bastiangreshaketzovaras surveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch AT juliangough surveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch |