GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions

Variant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-...

Full description

Bibliographic Details
Main Authors: Piotr Słowiński, Muzi Li, Paula Restrepo, Nawaf Alomran, Liam F. Spurr, Christian Miller, Krasimira Tsaneva-Atanasova, Anelia Horvath
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-09-01
Series:Frontiers in Bioengineering and Biotechnology
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fbioe.2020.01021/full
_version_ 1811192876879577088
author Piotr Słowiński
Muzi Li
Paula Restrepo
Paula Restrepo
Nawaf Alomran
Liam F. Spurr
Liam F. Spurr
Liam F. Spurr
Liam F. Spurr
Christian Miller
Krasimira Tsaneva-Atanasova
Krasimira Tsaneva-Atanasova
Anelia Horvath
Anelia Horvath
Anelia Horvath
author_facet Piotr Słowiński
Muzi Li
Paula Restrepo
Paula Restrepo
Nawaf Alomran
Liam F. Spurr
Liam F. Spurr
Liam F. Spurr
Liam F. Spurr
Christian Miller
Krasimira Tsaneva-Atanasova
Krasimira Tsaneva-Atanasova
Anelia Horvath
Anelia Horvath
Anelia Horvath
author_sort Piotr Słowiński
collection DOAJ
description Variant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-dosage transcriptional response. Here we present a novel methodology to assess gene and chromosomal allele asymmetries and to aid in identifying genomic alterations in RNA and DNA datasets. Our approach is based on analysis of the VAF distributions in chromosomal segments (continuous multi-SNV genomic regions). In each segment we estimate variant probability, a parameter of a random process that can generate synthetic VAF samples that closely resemble the observed data. We show that variant probability is a biologically interpretable quantitative descriptor of the VAF distribution in chromosomal segments which is consistent with other approaches. To this end, we apply the proposed methodology on data from 72 samples obtained from patients with breast invasive carcinoma (BRCA) from The Cancer Genome Atlas (TCGA). We compare DNA and RNA VAF distributions from matched RNA and whole exome sequencing (WES) datasets and find that both genomic signals give very similar segmentation and estimated variant probability profiles. We also find a correlation between variant probability with copy number alterations (CNA). Finally, to demonstrate a practical application of variant probabilities, we use them to estimate tumor purity. Tumor purity estimates based on variant probabilities demonstrate good concordance with other approaches (Pearson's correlation between 0.44 and 0.76). Our evaluation suggests that variant probabilities can serve as a dependable descriptor of VAF distribution, further enabling the statistical comparison of matched DNA and RNA datasets. Finally, they provide conceptual and mechanistic insights into relations between structure of VAF distributions and genetic events. The methodology is implemented in a Matlab toolbox that provides a suite of functions for analysis, statistical assessment and visualization of Genome and Transcriptome allele frequencies distributions. GeTallele is available at: https://github.com/SlowinskiPiotr/GeTallele.
first_indexed 2024-04-11T23:58:19Z
format Article
id doaj.art-3d4c2d83f07c46598df504e9a82158e6
institution Directory Open Access Journal
issn 2296-4185
language English
last_indexed 2024-04-11T23:58:19Z
publishDate 2020-09-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Bioengineering and Biotechnology
spelling doaj.art-3d4c2d83f07c46598df504e9a82158e62022-12-22T03:56:17ZengFrontiers Media S.A.Frontiers in Bioengineering and Biotechnology2296-41852020-09-01810.3389/fbioe.2020.01021554381GeTallele: A Method for Analysis of DNA and RNA Allele Frequency DistributionsPiotr Słowiński0Muzi Li1Paula Restrepo2Paula Restrepo3Nawaf Alomran4Liam F. Spurr5Liam F. Spurr6Liam F. Spurr7Liam F. Spurr8Christian Miller9Krasimira Tsaneva-Atanasova10Krasimira Tsaneva-Atanasova11Anelia Horvath12Anelia Horvath13Anelia Horvath14Department of Mathematics, College of Engineering, Mathematics and Physical Sciences, Living Systems Institute, Translational Research Exchange @ Exeter and The Engineering and Physical Sciences Research Council Centre for Predictive Modelling in Healthcare, University of Exeter, Exeter, United KingdomMcCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesMcCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesDepartment of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United StatesMcCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesMcCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesCancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, United StatesMedical Oncology, Dana-Farber Cancer Institute, Boston, MA, United StatesBiological Sciences Division, Pritzker School of Medicine, The University of Chicago, Chicago, IL, United StatesMcCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesDepartment of Mathematics, College of Engineering, Mathematics and Physical Sciences, Living Systems Institute, Translational Research Exchange @ Exeter and The Engineering and Physical Sciences Research Council Centre for Predictive Modelling in Healthcare, University of Exeter, Exeter, United KingdomDepartment of Bioinformatics and Mathematical Modelling, Institute of Biophysics and Biomedical Engineering, Bulgarian Academy of Sciences, Sofia, BulgariaMcCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesDepartment of Pharmacology and Physiology, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesDepartment of Biochemistry and Molecular Medicine, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesVariant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-dosage transcriptional response. Here we present a novel methodology to assess gene and chromosomal allele asymmetries and to aid in identifying genomic alterations in RNA and DNA datasets. Our approach is based on analysis of the VAF distributions in chromosomal segments (continuous multi-SNV genomic regions). In each segment we estimate variant probability, a parameter of a random process that can generate synthetic VAF samples that closely resemble the observed data. We show that variant probability is a biologically interpretable quantitative descriptor of the VAF distribution in chromosomal segments which is consistent with other approaches. To this end, we apply the proposed methodology on data from 72 samples obtained from patients with breast invasive carcinoma (BRCA) from The Cancer Genome Atlas (TCGA). We compare DNA and RNA VAF distributions from matched RNA and whole exome sequencing (WES) datasets and find that both genomic signals give very similar segmentation and estimated variant probability profiles. We also find a correlation between variant probability with copy number alterations (CNA). Finally, to demonstrate a practical application of variant probabilities, we use them to estimate tumor purity. Tumor purity estimates based on variant probabilities demonstrate good concordance with other approaches (Pearson's correlation between 0.44 and 0.76). Our evaluation suggests that variant probabilities can serve as a dependable descriptor of VAF distribution, further enabling the statistical comparison of matched DNA and RNA datasets. Finally, they provide conceptual and mechanistic insights into relations between structure of VAF distributions and genetic events. The methodology is implemented in a Matlab toolbox that provides a suite of functions for analysis, statistical assessment and visualization of Genome and Transcriptome allele frequencies distributions. GeTallele is available at: https://github.com/SlowinskiPiotr/GeTallele.https://www.frontiersin.org/article/10.3389/fbioe.2020.01021/fullvariant allele fraction (VAF)RNA—DNAearth mover's distance (EMD)circos plotfarey sequence
spellingShingle Piotr Słowiński
Muzi Li
Paula Restrepo
Paula Restrepo
Nawaf Alomran
Liam F. Spurr
Liam F. Spurr
Liam F. Spurr
Liam F. Spurr
Christian Miller
Krasimira Tsaneva-Atanasova
Krasimira Tsaneva-Atanasova
Anelia Horvath
Anelia Horvath
Anelia Horvath
GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
Frontiers in Bioengineering and Biotechnology
variant allele fraction (VAF)
RNA—DNA
earth mover's distance (EMD)
circos plot
farey sequence
title GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
title_full GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
title_fullStr GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
title_full_unstemmed GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
title_short GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
title_sort getallele a method for analysis of dna and rna allele frequency distributions
topic variant allele fraction (VAF)
RNA—DNA
earth mover's distance (EMD)
circos plot
farey sequence
url https://www.frontiersin.org/article/10.3389/fbioe.2020.01021/full
work_keys_str_mv AT piotrsłowinski getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT muzili getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT paularestrepo getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT paularestrepo getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT nawafalomran getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT liamfspurr getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT liamfspurr getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT liamfspurr getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT liamfspurr getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT christianmiller getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT krasimiratsanevaatanasova getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT krasimiratsanevaatanasova getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT aneliahorvath getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT aneliahorvath getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions
AT aneliahorvath getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions