A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction

Introduction: Variants in 5′ and 3′ untranslated regions (UTR) contribute to rare disease. While predictive algorithms to assist in classifying pathogenicity can potentially be highly valuable, the utility of these tools is often unclear, as it depends on carefully selected training and validation c...

Full description

Bibliographic Details
Main Authors: Emma Bohn, Tammy T. Y. Lau, Omar Wagih, Tehmina Masud, Daniele Merico
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-09-01
Series:Frontiers in Molecular Biosciences
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmolb.2023.1257550/full
_version_ 1797689646822457344
author Emma Bohn
Tammy T. Y. Lau
Omar Wagih
Tehmina Masud
Daniele Merico
Daniele Merico
author_facet Emma Bohn
Tammy T. Y. Lau
Omar Wagih
Tehmina Masud
Daniele Merico
Daniele Merico
author_sort Emma Bohn
collection DOAJ
description Introduction: Variants in 5′ and 3′ untranslated regions (UTR) contribute to rare disease. While predictive algorithms to assist in classifying pathogenicity can potentially be highly valuable, the utility of these tools is often unclear, as it depends on carefully selected training and validation conditions. To address this, we developed a high confidence set of pathogenic (P) and likely pathogenic (LP) variants and assessed deep learning (DL) models for predicting their molecular effects.Methods: 3′ and 5′ UTR variants documented as P or LP (P/LP) were obtained from ClinVar and refined by reviewing the annotated variant effect and reassessing evidence of pathogenicity following published guidelines. Prediction scores from sequence-based DL models were compared between three groups: P/LP variants acting though the mechanism for which the model was designed (model-matched), those operating through other mechanisms (model-mismatched), and putative benign variants. PhyloP was used to compare conservation scores between P/LP and putative benign variants.Results: 295 3′ and 188 5′ UTR variants were obtained from ClinVar, of which 26 3′ and 68 5′ UTR variants were classified as P/LP. Predictions by DL models achieved statistically significant differences when comparing modelmatched P/LP variants to both putative benign variants and modelmismatched P/LP variants, as well as when comparing all P/LP variants to putative benign variants. PhyloP conservation scores were significantly higher among P/LP compared to putative benign variants for both the 3′ and 5′ UTR.Discussion: In conclusion, we present a high-confidence set of P/LP 3′ and 5′ UTR variants spanning a range of mechanisms and supported by detailed pathogenicity and molecular mechanism evidence curation. Predictions from DL models further substantiate these classifications. These datasets will support further development and validation of DL algorithms designed to predict the functional impact of variants that may be implicated in rare disease.
first_indexed 2024-03-12T01:49:10Z
format Article
id doaj.art-4467f0ae4f5a40f38d24c765e1f8d6d2
institution Directory Open Access Journal
issn 2296-889X
language English
last_indexed 2024-03-12T01:49:10Z
publishDate 2023-09-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Molecular Biosciences
spelling doaj.art-4467f0ae4f5a40f38d24c765e1f8d6d22023-09-08T15:49:59ZengFrontiers Media S.A.Frontiers in Molecular Biosciences2296-889X2023-09-011010.3389/fmolb.2023.12575501257550A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect predictionEmma Bohn0Tammy T. Y. Lau1Omar Wagih2Tehmina Masud3Daniele Merico4Daniele Merico5Deep Genomics Inc., Toronto, ON, CanadaDeep Genomics Inc., Toronto, ON, CanadaDeep Genomics Inc., Toronto, ON, CanadaDeep Genomics Inc., Toronto, ON, CanadaDeep Genomics Inc., Toronto, ON, CanadaThe Centre for Applied Genomics, Hospital for Sick Children, Toronto, ON, CanadaIntroduction: Variants in 5′ and 3′ untranslated regions (UTR) contribute to rare disease. While predictive algorithms to assist in classifying pathogenicity can potentially be highly valuable, the utility of these tools is often unclear, as it depends on carefully selected training and validation conditions. To address this, we developed a high confidence set of pathogenic (P) and likely pathogenic (LP) variants and assessed deep learning (DL) models for predicting their molecular effects.Methods: 3′ and 5′ UTR variants documented as P or LP (P/LP) were obtained from ClinVar and refined by reviewing the annotated variant effect and reassessing evidence of pathogenicity following published guidelines. Prediction scores from sequence-based DL models were compared between three groups: P/LP variants acting though the mechanism for which the model was designed (model-matched), those operating through other mechanisms (model-mismatched), and putative benign variants. PhyloP was used to compare conservation scores between P/LP and putative benign variants.Results: 295 3′ and 188 5′ UTR variants were obtained from ClinVar, of which 26 3′ and 68 5′ UTR variants were classified as P/LP. Predictions by DL models achieved statistically significant differences when comparing modelmatched P/LP variants to both putative benign variants and modelmismatched P/LP variants, as well as when comparing all P/LP variants to putative benign variants. PhyloP conservation scores were significantly higher among P/LP compared to putative benign variants for both the 3′ and 5′ UTR.Discussion: In conclusion, we present a high-confidence set of P/LP 3′ and 5′ UTR variants spanning a range of mechanisms and supported by detailed pathogenicity and molecular mechanism evidence curation. Predictions from DL models further substantiate these classifications. These datasets will support further development and validation of DL algorithms designed to predict the functional impact of variants that may be implicated in rare disease.https://www.frontiersin.org/articles/10.3389/fmolb.2023.1257550/fulldeep learningnon-coding variationrare diseaseuntranslated region (UTR)variant classification
spellingShingle Emma Bohn
Tammy T. Y. Lau
Omar Wagih
Tehmina Masud
Daniele Merico
Daniele Merico
A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction
Frontiers in Molecular Biosciences
deep learning
non-coding variation
rare disease
untranslated region (UTR)
variant classification
title A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction
title_full A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction
title_fullStr A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction
title_full_unstemmed A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction
title_short A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction
title_sort curated census of pathogenic and likely pathogenic utr variants and evaluation of deep learning models for variant effect prediction
topic deep learning
non-coding variation
rare disease
untranslated region (UTR)
variant classification
url https://www.frontiersin.org/articles/10.3389/fmolb.2023.1257550/full
work_keys_str_mv AT emmabohn acuratedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction
AT tammytylau acuratedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction
AT omarwagih acuratedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction
AT tehminamasud acuratedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction
AT danielemerico acuratedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction
AT danielemerico acuratedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction
AT emmabohn curatedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction
AT tammytylau curatedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction
AT omarwagih curatedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction
AT tehminamasud curatedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction
AT danielemerico curatedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction
AT danielemerico curatedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction