Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples

Next-generation sequencing (NGS) of whole genomes has become more accessible to biomedical researchers as the sequencing price continues to drop, and more laboratories have NGS facilities or have access to a core facility. However, the rapid and robust development of practical bioinformatics pipelin...

Full description

Bibliographic Details
Main Authors: Marcus Høy Hansen, Charlotte Guldborg Nyvold
Format: Article
Language:English
Published: Elsevier 2021-10-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340921006338
_version_ 1818580618818093056
author Marcus Høy Hansen
Charlotte Guldborg Nyvold
author_facet Marcus Høy Hansen
Charlotte Guldborg Nyvold
author_sort Marcus Høy Hansen
collection DOAJ
description Next-generation sequencing (NGS) of whole genomes has become more accessible to biomedical researchers as the sequencing price continues to drop, and more laboratories have NGS facilities or have access to a core facility. However, the rapid and robust development of practical bioinformatics pipelines partly depends on convenient access to data for the testing of algorithms. Publicly available data sets constitute a part of this strategy.Here, we provide a triplicate whole-genome paired-end sequencing data set, consisting of 1.38 billion raw sequencing reads derived from saliva DNA from a single anonymous male Caucasian donor, with the average sequencing depths aimed at 30x for two of the samples and 4x for a low-coverage sample. The raw number of single nucleotide variants were 3.3–4 million and the median variant read depth of GATK4-passed variants in three samples was 22, 18, and 10. 81% of all variants were found in two or three of the samples, whereas 19% were singletons. The karyotype was evaluated as 46,XY with no apparent copy-number variation.The data set is provided without restrictions for research, educational or commercial purposes.
first_indexed 2024-12-16T07:20:28Z
format Article
id doaj.art-37250ef840b34d189f092908fa8cba4c
institution Directory Open Access Journal
issn 2352-3409
language English
last_indexed 2024-12-16T07:20:28Z
publishDate 2021-10-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj.art-37250ef840b34d189f092908fa8cba4c2022-12-21T22:39:39ZengElsevierData in Brief2352-34092021-10-0138107349Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samplesMarcus Høy Hansen0Charlotte Guldborg Nyvold1Haematology-Pathology Research Laboratory, Research Unit for Haematology and Research Unit for Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark; Department of Hematology, Odense University Hospital, Odense, Denmark; Corresponding author at: Haematology-Pathology Research Laboratory, Research Unit for Haematology and Research Unit for Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark.Haematology-Pathology Research Laboratory, Research Unit for Haematology and Research Unit for Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark; Department of Hematology, Odense University Hospital, Odense, DenmarkNext-generation sequencing (NGS) of whole genomes has become more accessible to biomedical researchers as the sequencing price continues to drop, and more laboratories have NGS facilities or have access to a core facility. However, the rapid and robust development of practical bioinformatics pipelines partly depends on convenient access to data for the testing of algorithms. Publicly available data sets constitute a part of this strategy.Here, we provide a triplicate whole-genome paired-end sequencing data set, consisting of 1.38 billion raw sequencing reads derived from saliva DNA from a single anonymous male Caucasian donor, with the average sequencing depths aimed at 30x for two of the samples and 4x for a low-coverage sample. The raw number of single nucleotide variants were 3.3–4 million and the median variant read depth of GATK4-passed variants in three samples was 22, 18, and 10. 81% of all variants were found in two or three of the samples, whereas 19% were singletons. The karyotype was evaluated as 46,XY with no apparent copy-number variation.The data set is provided without restrictions for research, educational or commercial purposes.http://www.sciencedirect.com/science/article/pii/S2352340921006338Whole-genomeHomo Sapiens genomeNext-generation sequencing (NGS)DNA sequencingRaw data replicate
spellingShingle Marcus Høy Hansen
Charlotte Guldborg Nyvold
Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples
Data in Brief
Whole-genome
Homo Sapiens genome
Next-generation sequencing (NGS)
DNA sequencing
Raw data replicate
title Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples
title_full Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples
title_fullStr Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples
title_full_unstemmed Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples
title_short Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples
title_sort replicate whole genome next generation sequencing data derived from caucasian donor saliva samples
topic Whole-genome
Homo Sapiens genome
Next-generation sequencing (NGS)
DNA sequencing
Raw data replicate
url http://www.sciencedirect.com/science/article/pii/S2352340921006338
work_keys_str_mv AT marcushøyhansen replicatewholegenomenextgenerationsequencingdataderivedfromcaucasiandonorsalivasamples
AT charlotteguldborgnyvold replicatewholegenomenextgenerationsequencingdataderivedfromcaucasiandonorsalivasamples