A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data

Abstract Background Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed tha...

Full description

Bibliographic Details
Main Authors: Johanna Bertl, Qianyun Guo, Malene Juul, Søren Besenbacher, Morten Muhlig Nielsen, Henrik Hornshøj, Jakob Skou Pedersen, Asger Hobolth
Format: Article
Language:English
Published: BMC 2018-04-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2141-2
_version_ 1818350764221792256
author Johanna Bertl
Qianyun Guo
Malene Juul
Søren Besenbacher
Morten Muhlig Nielsen
Henrik Hornshøj
Jakob Skou Pedersen
Asger Hobolth
author_facet Johanna Bertl
Qianyun Guo
Malene Juul
Søren Besenbacher
Morten Muhlig Nielsen
Henrik Hornshøj
Jakob Skou Pedersen
Asger Hobolth
author_sort Johanna Bertl
collection DOAJ
description Abstract Background Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. Results To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. Conclusion We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.
first_indexed 2024-12-13T18:27:02Z
format Article
id doaj.art-1b55cf0177f64be48276f4a107619169
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-13T18:27:02Z
publishDate 2018-04-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-1b55cf0177f64be48276f4a1076191692022-12-21T23:35:34ZengBMCBMC Bioinformatics1471-21052018-04-0119111510.1186/s12859-018-2141-2A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer dataJohanna Bertl0Qianyun Guo1Malene Juul2Søren Besenbacher3Morten Muhlig Nielsen4Henrik Hornshøj5Jakob Skou Pedersen6Asger Hobolth7Department of Molecular Medicine, Aarhus UniversityDepartment of Molecular Medicine, Aarhus UniversityBioinformatics Research Centre, Aarhus UniversityDepartment of Molecular Medicine, Aarhus UniversityDepartment of Molecular Medicine, Aarhus UniversityDepartment of Molecular Medicine, Aarhus UniversityDepartment of Molecular Medicine, Aarhus UniversityDepartment of Molecular Medicine, Aarhus UniversityAbstract Background Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. Results To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. Conclusion We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.http://link.springer.com/article/10.1186/s12859-018-2141-2Multinomial logistic regressionSite-specific modelSomatic cancer mutations
spellingShingle Johanna Bertl
Qianyun Guo
Malene Juul
Søren Besenbacher
Morten Muhlig Nielsen
Henrik Hornshøj
Jakob Skou Pedersen
Asger Hobolth
A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
BMC Bioinformatics
Multinomial logistic regression
Site-specific model
Somatic cancer mutations
title A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
title_full A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
title_fullStr A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
title_full_unstemmed A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
title_short A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
title_sort site specific model and analysis of the neutral somatic mutation rate in whole genome cancer data
topic Multinomial logistic regression
Site-specific model
Somatic cancer mutations
url http://link.springer.com/article/10.1186/s12859-018-2141-2
work_keys_str_mv AT johannabertl asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT qianyunguo asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT malenejuul asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT sørenbesenbacher asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT mortenmuhlignielsen asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT henrikhornshøj asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT jakobskoupedersen asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT asgerhobolth asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT johannabertl sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT qianyunguo sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT malenejuul sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT sørenbesenbacher sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT mortenmuhlignielsen sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT henrikhornshøj sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT jakobskoupedersen sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT asgerhobolth sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata