GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
Variant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2020-09-01
|
Series: | Frontiers in Bioengineering and Biotechnology |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fbioe.2020.01021/full |
_version_ | 1811192876879577088 |
---|---|
author | Piotr Słowiński Muzi Li Paula Restrepo Paula Restrepo Nawaf Alomran Liam F. Spurr Liam F. Spurr Liam F. Spurr Liam F. Spurr Christian Miller Krasimira Tsaneva-Atanasova Krasimira Tsaneva-Atanasova Anelia Horvath Anelia Horvath Anelia Horvath |
author_facet | Piotr Słowiński Muzi Li Paula Restrepo Paula Restrepo Nawaf Alomran Liam F. Spurr Liam F. Spurr Liam F. Spurr Liam F. Spurr Christian Miller Krasimira Tsaneva-Atanasova Krasimira Tsaneva-Atanasova Anelia Horvath Anelia Horvath Anelia Horvath |
author_sort | Piotr Słowiński |
collection | DOAJ |
description | Variant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-dosage transcriptional response. Here we present a novel methodology to assess gene and chromosomal allele asymmetries and to aid in identifying genomic alterations in RNA and DNA datasets. Our approach is based on analysis of the VAF distributions in chromosomal segments (continuous multi-SNV genomic regions). In each segment we estimate variant probability, a parameter of a random process that can generate synthetic VAF samples that closely resemble the observed data. We show that variant probability is a biologically interpretable quantitative descriptor of the VAF distribution in chromosomal segments which is consistent with other approaches. To this end, we apply the proposed methodology on data from 72 samples obtained from patients with breast invasive carcinoma (BRCA) from The Cancer Genome Atlas (TCGA). We compare DNA and RNA VAF distributions from matched RNA and whole exome sequencing (WES) datasets and find that both genomic signals give very similar segmentation and estimated variant probability profiles. We also find a correlation between variant probability with copy number alterations (CNA). Finally, to demonstrate a practical application of variant probabilities, we use them to estimate tumor purity. Tumor purity estimates based on variant probabilities demonstrate good concordance with other approaches (Pearson's correlation between 0.44 and 0.76). Our evaluation suggests that variant probabilities can serve as a dependable descriptor of VAF distribution, further enabling the statistical comparison of matched DNA and RNA datasets. Finally, they provide conceptual and mechanistic insights into relations between structure of VAF distributions and genetic events. The methodology is implemented in a Matlab toolbox that provides a suite of functions for analysis, statistical assessment and visualization of Genome and Transcriptome allele frequencies distributions. GeTallele is available at: https://github.com/SlowinskiPiotr/GeTallele. |
first_indexed | 2024-04-11T23:58:19Z |
format | Article |
id | doaj.art-3d4c2d83f07c46598df504e9a82158e6 |
institution | Directory Open Access Journal |
issn | 2296-4185 |
language | English |
last_indexed | 2024-04-11T23:58:19Z |
publishDate | 2020-09-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Bioengineering and Biotechnology |
spelling | doaj.art-3d4c2d83f07c46598df504e9a82158e62022-12-22T03:56:17ZengFrontiers Media S.A.Frontiers in Bioengineering and Biotechnology2296-41852020-09-01810.3389/fbioe.2020.01021554381GeTallele: A Method for Analysis of DNA and RNA Allele Frequency DistributionsPiotr Słowiński0Muzi Li1Paula Restrepo2Paula Restrepo3Nawaf Alomran4Liam F. Spurr5Liam F. Spurr6Liam F. Spurr7Liam F. Spurr8Christian Miller9Krasimira Tsaneva-Atanasova10Krasimira Tsaneva-Atanasova11Anelia Horvath12Anelia Horvath13Anelia Horvath14Department of Mathematics, College of Engineering, Mathematics and Physical Sciences, Living Systems Institute, Translational Research Exchange @ Exeter and The Engineering and Physical Sciences Research Council Centre for Predictive Modelling in Healthcare, University of Exeter, Exeter, United KingdomMcCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesMcCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesDepartment of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United StatesMcCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesMcCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesCancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, United StatesMedical Oncology, Dana-Farber Cancer Institute, Boston, MA, United StatesBiological Sciences Division, Pritzker School of Medicine, The University of Chicago, Chicago, IL, United StatesMcCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesDepartment of Mathematics, College of Engineering, Mathematics and Physical Sciences, Living Systems Institute, Translational Research Exchange @ Exeter and The Engineering and Physical Sciences Research Council Centre for Predictive Modelling in Healthcare, University of Exeter, Exeter, United KingdomDepartment of Bioinformatics and Mathematical Modelling, Institute of Biophysics and Biomedical Engineering, Bulgarian Academy of Sciences, Sofia, BulgariaMcCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesDepartment of Pharmacology and Physiology, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesDepartment of Biochemistry and Molecular Medicine, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United StatesVariant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-dosage transcriptional response. Here we present a novel methodology to assess gene and chromosomal allele asymmetries and to aid in identifying genomic alterations in RNA and DNA datasets. Our approach is based on analysis of the VAF distributions in chromosomal segments (continuous multi-SNV genomic regions). In each segment we estimate variant probability, a parameter of a random process that can generate synthetic VAF samples that closely resemble the observed data. We show that variant probability is a biologically interpretable quantitative descriptor of the VAF distribution in chromosomal segments which is consistent with other approaches. To this end, we apply the proposed methodology on data from 72 samples obtained from patients with breast invasive carcinoma (BRCA) from The Cancer Genome Atlas (TCGA). We compare DNA and RNA VAF distributions from matched RNA and whole exome sequencing (WES) datasets and find that both genomic signals give very similar segmentation and estimated variant probability profiles. We also find a correlation between variant probability with copy number alterations (CNA). Finally, to demonstrate a practical application of variant probabilities, we use them to estimate tumor purity. Tumor purity estimates based on variant probabilities demonstrate good concordance with other approaches (Pearson's correlation between 0.44 and 0.76). Our evaluation suggests that variant probabilities can serve as a dependable descriptor of VAF distribution, further enabling the statistical comparison of matched DNA and RNA datasets. Finally, they provide conceptual and mechanistic insights into relations between structure of VAF distributions and genetic events. The methodology is implemented in a Matlab toolbox that provides a suite of functions for analysis, statistical assessment and visualization of Genome and Transcriptome allele frequencies distributions. GeTallele is available at: https://github.com/SlowinskiPiotr/GeTallele.https://www.frontiersin.org/article/10.3389/fbioe.2020.01021/fullvariant allele fraction (VAF)RNA—DNAearth mover's distance (EMD)circos plotfarey sequence |
spellingShingle | Piotr Słowiński Muzi Li Paula Restrepo Paula Restrepo Nawaf Alomran Liam F. Spurr Liam F. Spurr Liam F. Spurr Liam F. Spurr Christian Miller Krasimira Tsaneva-Atanasova Krasimira Tsaneva-Atanasova Anelia Horvath Anelia Horvath Anelia Horvath GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions Frontiers in Bioengineering and Biotechnology variant allele fraction (VAF) RNA—DNA earth mover's distance (EMD) circos plot farey sequence |
title | GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions |
title_full | GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions |
title_fullStr | GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions |
title_full_unstemmed | GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions |
title_short | GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions |
title_sort | getallele a method for analysis of dna and rna allele frequency distributions |
topic | variant allele fraction (VAF) RNA—DNA earth mover's distance (EMD) circos plot farey sequence |
url | https://www.frontiersin.org/article/10.3389/fbioe.2020.01021/full |
work_keys_str_mv | AT piotrsłowinski getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT muzili getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT paularestrepo getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT paularestrepo getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT nawafalomran getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT liamfspurr getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT liamfspurr getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT liamfspurr getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT liamfspurr getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT christianmiller getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT krasimiratsanevaatanasova getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT krasimiratsanevaatanasova getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT aneliahorvath getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT aneliahorvath getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions AT aneliahorvath getalleleamethodforanalysisofdnaandrnaallelefrequencydistributions |