A validated lineage-derived somatic truth data set enables benchmarking in cancer genome analysis

Existing cancer benchmark data sets for human sequencing data use germline variants, synthetic methods, or expensive validations, none of which are satisfactory for providing a large collection of true somatic variation across a whole genome. Here we propose a data set, Lineage derived Somatic Truth...

Full description

Bibliographic Details
Main Authors: Shand, Megan, Soto, Jose, Lichtenstein, Lee, Benjamin, David, Farjoun, Yossi, Brody, Yehuda, Maruvka, Yosef, Blainey, Paul C, Banks, Eric
Other Authors: Massachusetts Institute of Technology. Department of Biological Engineering
Format: Article
Language:English
Published: Springer Science and Business Media LLC 2021
Online Access:https://hdl.handle.net/1721.1/133024
_version_ 1811091233698742272
author Shand, Megan
Soto, Jose
Lichtenstein, Lee
Benjamin, David
Farjoun, Yossi
Brody, Yehuda
Maruvka, Yosef
Blainey, Paul C
Banks, Eric
author2 Massachusetts Institute of Technology. Department of Biological Engineering
author_facet Massachusetts Institute of Technology. Department of Biological Engineering
Shand, Megan
Soto, Jose
Lichtenstein, Lee
Benjamin, David
Farjoun, Yossi
Brody, Yehuda
Maruvka, Yosef
Blainey, Paul C
Banks, Eric
author_sort Shand, Megan
collection MIT
description Existing cancer benchmark data sets for human sequencing data use germline variants, synthetic methods, or expensive validations, none of which are satisfactory for providing a large collection of true somatic variation across a whole genome. Here we propose a data set, Lineage derived Somatic Truth (LinST), of short somatic mutations in the HT115 colon cancer cell-line, that are validated using a known cell lineage that includes thousands of mutations and a high confidence region covering 2.7 gigabases per sample.
first_indexed 2024-09-23T14:59:07Z
format Article
id mit-1721.1/133024
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T14:59:07Z
publishDate 2021
publisher Springer Science and Business Media LLC
record_format dspace
spelling mit-1721.1/1330242021-10-27T19:54:46Z A validated lineage-derived somatic truth data set enables benchmarking in cancer genome analysis Shand, Megan Soto, Jose Lichtenstein, Lee Benjamin, David Farjoun, Yossi Brody, Yehuda Maruvka, Yosef Blainey, Paul C Banks, Eric Massachusetts Institute of Technology. Department of Biological Engineering Koch Institute for Integrative Cancer Research at MIT Existing cancer benchmark data sets for human sequencing data use germline variants, synthetic methods, or expensive validations, none of which are satisfactory for providing a large collection of true somatic variation across a whole genome. Here we propose a data set, Lineage derived Somatic Truth (LinST), of short somatic mutations in the HT115 colon cancer cell-line, that are validated using a known cell lineage that includes thousands of mutations and a high confidence region covering 2.7 gigabases per sample. 2021-10-18T16:49:37Z 2021-10-18T16:49:37Z 2020-12 2021-08-25T16:18:08Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/133024 Shand, Megan, Soto, Jose, Lichtenstein, Lee, Benjamin, David, Farjoun, Yossi et al. 2020. "A validated lineage-derived somatic truth data set enables benchmarking in cancer genome analysis." Communications Biology, 3 (1). en 10.1038/S42003-020-01460-9 Communications Biology Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/ application/pdf Springer Science and Business Media LLC Nature
spellingShingle Shand, Megan
Soto, Jose
Lichtenstein, Lee
Benjamin, David
Farjoun, Yossi
Brody, Yehuda
Maruvka, Yosef
Blainey, Paul C
Banks, Eric
A validated lineage-derived somatic truth data set enables benchmarking in cancer genome analysis
title A validated lineage-derived somatic truth data set enables benchmarking in cancer genome analysis
title_full A validated lineage-derived somatic truth data set enables benchmarking in cancer genome analysis
title_fullStr A validated lineage-derived somatic truth data set enables benchmarking in cancer genome analysis
title_full_unstemmed A validated lineage-derived somatic truth data set enables benchmarking in cancer genome analysis
title_short A validated lineage-derived somatic truth data set enables benchmarking in cancer genome analysis
title_sort validated lineage derived somatic truth data set enables benchmarking in cancer genome analysis
url https://hdl.handle.net/1721.1/133024
work_keys_str_mv AT shandmegan avalidatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT sotojose avalidatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT lichtensteinlee avalidatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT benjamindavid avalidatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT farjounyossi avalidatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT brodyyehuda avalidatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT maruvkayosef avalidatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT blaineypaulc avalidatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT bankseric avalidatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT shandmegan validatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT sotojose validatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT lichtensteinlee validatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT benjamindavid validatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT farjounyossi validatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT brodyyehuda validatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT maruvkayosef validatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT blaineypaulc validatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis
AT bankseric validatedlineagederivedsomatictruthdatasetenablesbenchmarkingincancergenomeanalysis