A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.
Genome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to ass...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2022-04-01
|
Series: | PLoS Computational Biology |
Online Access: | https://doi.org/10.1371/journal.pcbi.1010028 |
_version_ | 1818195978567548928 |
---|---|
author | Moustafa Abdalla Mohamed Abdalla |
author_facet | Moustafa Abdalla Mohamed Abdalla |
author_sort | Moustafa Abdalla |
collection | DOAJ |
description | Genome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to assist in this challenge. Our computational framework, peaBrain, models the transcriptional machinery of a tissue as a two-stage process: first, predicting the mean tissue specific abundance of all genes and second, incorporating the transcriptomic consequences of genotype variation to predict individual abundance on a subject-by-subject basis. We demonstrate that peaBrain accounts for the majority (>50%) of variance observed in mean transcript abundance across most tissues and outperforms regularized linear models in predicting the consequences of individual genotype variation. We highlight the validity of the peaBrain model by calculating non-coding impact scores that correlate with nucleotide evolutionary constraint that are also predictive of disease-associated variation and allele-specific transcription factor binding. We further show how these tissue-specific peaBrain scores can be leveraged to pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals. We subsequently: (a) derive continuous dense embeddings of genes for downstream applications; (b) highlight the utility of the model in predicting transcriptomic impact of small molecules and shRNA (on par with in vitro experimental replication of external test sets); (c) explore how peaBrain can be used to model difficult-to-study processes (such as neural induction); and (d) identify putatively functional eQTLs that are missed by high-throughput experimental approaches. |
first_indexed | 2024-12-12T01:26:46Z |
format | Article |
id | doaj.art-10500f028c1b443d89ceff31e3b9078d |
institution | Directory Open Access Journal |
issn | 1553-734X 1553-7358 |
language | English |
last_indexed | 2024-12-12T01:26:46Z |
publishDate | 2022-04-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS Computational Biology |
spelling | doaj.art-10500f028c1b443d89ceff31e3b9078d2022-12-22T00:43:04ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-04-01184e101002810.1371/journal.pcbi.1010028A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.Moustafa AbdallaMohamed AbdallaGenome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to assist in this challenge. Our computational framework, peaBrain, models the transcriptional machinery of a tissue as a two-stage process: first, predicting the mean tissue specific abundance of all genes and second, incorporating the transcriptomic consequences of genotype variation to predict individual abundance on a subject-by-subject basis. We demonstrate that peaBrain accounts for the majority (>50%) of variance observed in mean transcript abundance across most tissues and outperforms regularized linear models in predicting the consequences of individual genotype variation. We highlight the validity of the peaBrain model by calculating non-coding impact scores that correlate with nucleotide evolutionary constraint that are also predictive of disease-associated variation and allele-specific transcription factor binding. We further show how these tissue-specific peaBrain scores can be leveraged to pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals. We subsequently: (a) derive continuous dense embeddings of genes for downstream applications; (b) highlight the utility of the model in predicting transcriptomic impact of small molecules and shRNA (on par with in vitro experimental replication of external test sets); (c) explore how peaBrain can be used to model difficult-to-study processes (such as neural induction); and (d) identify putatively functional eQTLs that are missed by high-throughput experimental approaches.https://doi.org/10.1371/journal.pcbi.1010028 |
spellingShingle | Moustafa Abdalla Mohamed Abdalla A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules. PLoS Computational Biology |
title | A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules. |
title_full | A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules. |
title_fullStr | A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules. |
title_full_unstemmed | A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules. |
title_short | A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules. |
title_sort | general framework for predicting the transcriptomic consequences of non coding variation and small molecules |
url | https://doi.org/10.1371/journal.pcbi.1010028 |
work_keys_str_mv | AT moustafaabdalla ageneralframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules AT mohamedabdalla ageneralframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules AT moustafaabdalla generalframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules AT mohamedabdalla generalframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules |