A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.

Genome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to ass...

Full description

Bibliographic Details
Main Authors: Moustafa Abdalla, Mohamed Abdalla
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-04-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1010028
_version_ 1818195978567548928
author Moustafa Abdalla
Mohamed Abdalla
author_facet Moustafa Abdalla
Mohamed Abdalla
author_sort Moustafa Abdalla
collection DOAJ
description Genome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to assist in this challenge. Our computational framework, peaBrain, models the transcriptional machinery of a tissue as a two-stage process: first, predicting the mean tissue specific abundance of all genes and second, incorporating the transcriptomic consequences of genotype variation to predict individual abundance on a subject-by-subject basis. We demonstrate that peaBrain accounts for the majority (>50%) of variance observed in mean transcript abundance across most tissues and outperforms regularized linear models in predicting the consequences of individual genotype variation. We highlight the validity of the peaBrain model by calculating non-coding impact scores that correlate with nucleotide evolutionary constraint that are also predictive of disease-associated variation and allele-specific transcription factor binding. We further show how these tissue-specific peaBrain scores can be leveraged to pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals. We subsequently: (a) derive continuous dense embeddings of genes for downstream applications; (b) highlight the utility of the model in predicting transcriptomic impact of small molecules and shRNA (on par with in vitro experimental replication of external test sets); (c) explore how peaBrain can be used to model difficult-to-study processes (such as neural induction); and (d) identify putatively functional eQTLs that are missed by high-throughput experimental approaches.
first_indexed 2024-12-12T01:26:46Z
format Article
id doaj.art-10500f028c1b443d89ceff31e3b9078d
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-12-12T01:26:46Z
publishDate 2022-04-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-10500f028c1b443d89ceff31e3b9078d2022-12-22T00:43:04ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-04-01184e101002810.1371/journal.pcbi.1010028A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.Moustafa AbdallaMohamed AbdallaGenome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to assist in this challenge. Our computational framework, peaBrain, models the transcriptional machinery of a tissue as a two-stage process: first, predicting the mean tissue specific abundance of all genes and second, incorporating the transcriptomic consequences of genotype variation to predict individual abundance on a subject-by-subject basis. We demonstrate that peaBrain accounts for the majority (>50%) of variance observed in mean transcript abundance across most tissues and outperforms regularized linear models in predicting the consequences of individual genotype variation. We highlight the validity of the peaBrain model by calculating non-coding impact scores that correlate with nucleotide evolutionary constraint that are also predictive of disease-associated variation and allele-specific transcription factor binding. We further show how these tissue-specific peaBrain scores can be leveraged to pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals. We subsequently: (a) derive continuous dense embeddings of genes for downstream applications; (b) highlight the utility of the model in predicting transcriptomic impact of small molecules and shRNA (on par with in vitro experimental replication of external test sets); (c) explore how peaBrain can be used to model difficult-to-study processes (such as neural induction); and (d) identify putatively functional eQTLs that are missed by high-throughput experimental approaches.https://doi.org/10.1371/journal.pcbi.1010028
spellingShingle Moustafa Abdalla
Mohamed Abdalla
A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.
PLoS Computational Biology
title A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.
title_full A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.
title_fullStr A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.
title_full_unstemmed A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.
title_short A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.
title_sort general framework for predicting the transcriptomic consequences of non coding variation and small molecules
url https://doi.org/10.1371/journal.pcbi.1010028
work_keys_str_mv AT moustafaabdalla ageneralframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules
AT mohamedabdalla ageneralframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules
AT moustafaabdalla generalframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules
AT mohamedabdalla generalframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules