Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy

<p><strong>Background</strong> Although rare missense variants in Mendelian disease genes often cluster in specific regions of proteins, it is unclear how to consider this when evaluating the pathogenicity of a gene or variant. Here we introduce methods for gene association and var...

Full description

Bibliographic Details
Main Authors: Waring, A, Harper, A, Salatino, S, Kramer, C, Neubauer, S, Thomson, K, Watkins, H, Farrall, M
Format: Journal article
Language:English
Published: BMJ Publishing Group 2020
_version_ 1826258926887239680
author Waring, A
Harper, A
Salatino, S
Kramer, C
Neubauer, S
Thomson, K
Watkins, H
Farrall, M
author_facet Waring, A
Harper, A
Salatino, S
Kramer, C
Neubauer, S
Thomson, K
Watkins, H
Farrall, M
author_sort Waring, A
collection OXFORD
description <p><strong>Background</strong> Although rare missense variants in Mendelian disease genes often cluster in specific regions of proteins, it is unclear how to consider this when evaluating the pathogenicity of a gene or variant. Here we introduce methods for gene association and variant interpretation that use this powerful signal. <p><strong>Methods</strong> We present statistical methods to detect missense variant clustering (BIN-test) combined with burden information (ClusterBurden). We introduce a flexible generalised additive modelling (GAM) framework to identify mutational hotspots using burden and clustering information (hotspot model) and supplemented by in silico predictors (hotspot+ model). The methods were applied to synthetic data and a case–control dataset, comprising 5338 hypertrophic cardiomyopathy patients and 125 748 population reference samples over 34 putative cardiomyopathy genes. <p><strong>Results</strong> In simulations, the BIN-test was almost twice as powerful as the Anderson-Darling or Kolmogorov-Smirnov tests; ClusterBurden was computationally faster and more powerful than alternative position-informed methods. For 6/8 sarcomeric genes with strong clustering, Clusterburden showed enhanced power over burden-alone, equivalent to increasing the sample size by 50%. Hotspot+ models that combine burden, clustering and in silico predictors outperform generic pathogenicity predictors and effectively integrate ACMG criteria PM1 and PP3 to yield strong or moderate evidence of pathogenicity for 31.8% of examined variants of uncertain significance. <p><strong>Conclusion</strong> GAMs represent a unified statistical modelling framework to combine burden, clustering and functional information. Hotspot models can refine maps of regional burden and hotspot+ models can be powerful predictors of variant pathogenicity. The BIN-test is a fast powerful approach to detect missense variant clustering that when combined with burden information (ClusterBurden) may enhance disease-gene discovery.
first_indexed 2024-03-06T18:41:45Z
format Journal article
id oxford-uuid:0d16b222-2a31-46e4-ba1a-bebac623335d
institution University of Oxford
language English
last_indexed 2024-03-06T18:41:45Z
publishDate 2020
publisher BMJ Publishing Group
record_format dspace
spelling oxford-uuid:0d16b222-2a31-46e4-ba1a-bebac623335d2022-03-26T09:38:48ZData-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy Journal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:0d16b222-2a31-46e4-ba1a-bebac623335dEnglishSymplectic ElementsBMJ Publishing Group2020Waring, AHarper, ASalatino, SKramer, CNeubauer, SThomson, KWatkins, HFarrall, M<p><strong>Background</strong> Although rare missense variants in Mendelian disease genes often cluster in specific regions of proteins, it is unclear how to consider this when evaluating the pathogenicity of a gene or variant. Here we introduce methods for gene association and variant interpretation that use this powerful signal. <p><strong>Methods</strong> We present statistical methods to detect missense variant clustering (BIN-test) combined with burden information (ClusterBurden). We introduce a flexible generalised additive modelling (GAM) framework to identify mutational hotspots using burden and clustering information (hotspot model) and supplemented by in silico predictors (hotspot+ model). The methods were applied to synthetic data and a case–control dataset, comprising 5338 hypertrophic cardiomyopathy patients and 125 748 population reference samples over 34 putative cardiomyopathy genes. <p><strong>Results</strong> In simulations, the BIN-test was almost twice as powerful as the Anderson-Darling or Kolmogorov-Smirnov tests; ClusterBurden was computationally faster and more powerful than alternative position-informed methods. For 6/8 sarcomeric genes with strong clustering, Clusterburden showed enhanced power over burden-alone, equivalent to increasing the sample size by 50%. Hotspot+ models that combine burden, clustering and in silico predictors outperform generic pathogenicity predictors and effectively integrate ACMG criteria PM1 and PP3 to yield strong or moderate evidence of pathogenicity for 31.8% of examined variants of uncertain significance. <p><strong>Conclusion</strong> GAMs represent a unified statistical modelling framework to combine burden, clustering and functional information. Hotspot models can refine maps of regional burden and hotspot+ models can be powerful predictors of variant pathogenicity. The BIN-test is a fast powerful approach to detect missense variant clustering that when combined with burden information (ClusterBurden) may enhance disease-gene discovery.
spellingShingle Waring, A
Harper, A
Salatino, S
Kramer, C
Neubauer, S
Thomson, K
Watkins, H
Farrall, M
Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
title Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
title_full Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
title_fullStr Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
title_full_unstemmed Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
title_short Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
title_sort data driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
work_keys_str_mv AT waringa datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy
AT harpera datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy
AT salatinos datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy
AT kramerc datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy
AT neubauers datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy
AT thomsonk datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy
AT watkinsh datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy
AT farrallm datadrivenmodellingofmutationalhotspotsandinsilicopredictorsinhypertrophiccardiomyopathy