A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research

Two major forces have contributed to the fast growth of human genetic data. One from medical research supported by governments and academic institutes; the other from direct-to-consumer (DTC) sequencing companies. While the former benefits from meticulously designed sequencing standards and quality...

Full description

Bibliographic Details
Main Authors: Chang Lu, Bastian Greshake Tzovaras, Julian Gough
Format: Article
Language:English
Published: Elsevier 2021-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037021002786
_version_ 1818979566484455424
author Chang Lu
Bastian Greshake Tzovaras
Julian Gough
author_facet Chang Lu
Bastian Greshake Tzovaras
Julian Gough
author_sort Chang Lu
collection DOAJ
description Two major forces have contributed to the fast growth of human genetic data. One from medical research supported by governments and academic institutes; the other from direct-to-consumer (DTC) sequencing companies. While the former benefits from meticulously designed sequencing standards and quality control procedures, the latter comes in various formats and sequencing methods which are subject to changes over time and the particular needs of different companies. Thanks to the general public who shared their DNA data without constraint, here we provide a review for over 7000 genomes made public between 2011 and 2020, and produced by over six DTC sequencing companies. An open source tool-kit to systematically parse, quality check and filter genome files and statistically problematic alleles is provided to prepare consumer DNA datasets for research. The GenomePrep output is available in two common DNA datafile formats to enable further analysis with other tools. We also provide for download the combined output for all OpenSNP array genomes processed in this paper in a single data freeze file.
first_indexed 2024-12-20T17:01:34Z
format Article
id doaj.art-b30848b348304fe4a2776b0b54fa6402
institution Directory Open Access Journal
issn 2001-0370
language English
last_indexed 2024-12-20T17:01:34Z
publishDate 2021-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj.art-b30848b348304fe4a2776b0b54fa64022022-12-21T19:32:31ZengElsevierComputational and Structural Biotechnology Journal2001-03702021-01-011937473754A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for researchChang Lu0Bastian Greshake Tzovaras1Julian Gough2MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, UK; Corresponding author.Center for Research and Interdisciplinarity (CRI), Universite de Paris, INSERM U1284, Paris, FranceMRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, UKTwo major forces have contributed to the fast growth of human genetic data. One from medical research supported by governments and academic institutes; the other from direct-to-consumer (DTC) sequencing companies. While the former benefits from meticulously designed sequencing standards and quality control procedures, the latter comes in various formats and sequencing methods which are subject to changes over time and the particular needs of different companies. Thanks to the general public who shared their DNA data without constraint, here we provide a review for over 7000 genomes made public between 2011 and 2020, and produced by over six DTC sequencing companies. An open source tool-kit to systematically parse, quality check and filter genome files and statistically problematic alleles is provided to prepare consumer DNA datasets for research. The GenomePrep output is available in two common DNA datafile formats to enable further analysis with other tools. We also provide for download the combined output for all OpenSNP array genomes processed in this paper in a single data freeze file.http://www.sciencedirect.com/science/article/pii/S2001037021002786GenotypingDirect-to-consumer sequencingOpen genomePersonal genomeSNP arrays
spellingShingle Chang Lu
Bastian Greshake Tzovaras
Julian Gough
A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research
Computational and Structural Biotechnology Journal
Genotyping
Direct-to-consumer sequencing
Open genome
Personal genome
SNP arrays
title A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research
title_full A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research
title_fullStr A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research
title_full_unstemmed A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research
title_short A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research
title_sort survey of direct to consumer genotype data and quality control tool genomeprep for research
topic Genotyping
Direct-to-consumer sequencing
Open genome
Personal genome
SNP arrays
url http://www.sciencedirect.com/science/article/pii/S2001037021002786
work_keys_str_mv AT changlu asurveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch
AT bastiangreshaketzovaras asurveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch
AT juliangough asurveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch
AT changlu surveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch
AT bastiangreshaketzovaras surveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch
AT juliangough surveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch