Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples
Next-generation sequencing (NGS) of whole genomes has become more accessible to biomedical researchers as the sequencing price continues to drop, and more laboratories have NGS facilities or have access to a core facility. However, the rapid and robust development of practical bioinformatics pipelin...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2021-10-01
|
Series: | Data in Brief |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340921006338 |
_version_ | 1818580618818093056 |
---|---|
author | Marcus Høy Hansen Charlotte Guldborg Nyvold |
author_facet | Marcus Høy Hansen Charlotte Guldborg Nyvold |
author_sort | Marcus Høy Hansen |
collection | DOAJ |
description | Next-generation sequencing (NGS) of whole genomes has become more accessible to biomedical researchers as the sequencing price continues to drop, and more laboratories have NGS facilities or have access to a core facility. However, the rapid and robust development of practical bioinformatics pipelines partly depends on convenient access to data for the testing of algorithms. Publicly available data sets constitute a part of this strategy.Here, we provide a triplicate whole-genome paired-end sequencing data set, consisting of 1.38 billion raw sequencing reads derived from saliva DNA from a single anonymous male Caucasian donor, with the average sequencing depths aimed at 30x for two of the samples and 4x for a low-coverage sample. The raw number of single nucleotide variants were 3.3–4 million and the median variant read depth of GATK4-passed variants in three samples was 22, 18, and 10. 81% of all variants were found in two or three of the samples, whereas 19% were singletons. The karyotype was evaluated as 46,XY with no apparent copy-number variation.The data set is provided without restrictions for research, educational or commercial purposes. |
first_indexed | 2024-12-16T07:20:28Z |
format | Article |
id | doaj.art-37250ef840b34d189f092908fa8cba4c |
institution | Directory Open Access Journal |
issn | 2352-3409 |
language | English |
last_indexed | 2024-12-16T07:20:28Z |
publishDate | 2021-10-01 |
publisher | Elsevier |
record_format | Article |
series | Data in Brief |
spelling | doaj.art-37250ef840b34d189f092908fa8cba4c2022-12-21T22:39:39ZengElsevierData in Brief2352-34092021-10-0138107349Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samplesMarcus Høy Hansen0Charlotte Guldborg Nyvold1Haematology-Pathology Research Laboratory, Research Unit for Haematology and Research Unit for Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark; Department of Hematology, Odense University Hospital, Odense, Denmark; Corresponding author at: Haematology-Pathology Research Laboratory, Research Unit for Haematology and Research Unit for Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark.Haematology-Pathology Research Laboratory, Research Unit for Haematology and Research Unit for Pathology, University of Southern Denmark and Odense University Hospital, Odense, Denmark; Department of Hematology, Odense University Hospital, Odense, DenmarkNext-generation sequencing (NGS) of whole genomes has become more accessible to biomedical researchers as the sequencing price continues to drop, and more laboratories have NGS facilities or have access to a core facility. However, the rapid and robust development of practical bioinformatics pipelines partly depends on convenient access to data for the testing of algorithms. Publicly available data sets constitute a part of this strategy.Here, we provide a triplicate whole-genome paired-end sequencing data set, consisting of 1.38 billion raw sequencing reads derived from saliva DNA from a single anonymous male Caucasian donor, with the average sequencing depths aimed at 30x for two of the samples and 4x for a low-coverage sample. The raw number of single nucleotide variants were 3.3–4 million and the median variant read depth of GATK4-passed variants in three samples was 22, 18, and 10. 81% of all variants were found in two or three of the samples, whereas 19% were singletons. The karyotype was evaluated as 46,XY with no apparent copy-number variation.The data set is provided without restrictions for research, educational or commercial purposes.http://www.sciencedirect.com/science/article/pii/S2352340921006338Whole-genomeHomo Sapiens genomeNext-generation sequencing (NGS)DNA sequencingRaw data replicate |
spellingShingle | Marcus Høy Hansen Charlotte Guldborg Nyvold Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples Data in Brief Whole-genome Homo Sapiens genome Next-generation sequencing (NGS) DNA sequencing Raw data replicate |
title | Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples |
title_full | Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples |
title_fullStr | Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples |
title_full_unstemmed | Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples |
title_short | Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples |
title_sort | replicate whole genome next generation sequencing data derived from caucasian donor saliva samples |
topic | Whole-genome Homo Sapiens genome Next-generation sequencing (NGS) DNA sequencing Raw data replicate |
url | http://www.sciencedirect.com/science/article/pii/S2352340921006338 |
work_keys_str_mv | AT marcushøyhansen replicatewholegenomenextgenerationsequencingdataderivedfromcaucasiandonorsalivasamples AT charlotteguldborgnyvold replicatewholegenomenextgenerationsequencingdataderivedfromcaucasiandonorsalivasamples |