A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.

Genome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to ass...

Full description

Bibliographic Details
Main Authors:	Moustafa Abdalla, Mohamed Abdalla
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2022-04-01
Series:	PLoS Computational Biology
Online Access:	https://doi.org/10.1371/journal.pcbi.1010028

_version_	1818195978567548928
author	Moustafa Abdalla Mohamed Abdalla
author_facet	Moustafa Abdalla Mohamed Abdalla
author_sort	Moustafa Abdalla
collection	DOAJ
description	Genome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to assist in this challenge. Our computational framework, peaBrain, models the transcriptional machinery of a tissue as a two-stage process: first, predicting the mean tissue specific abundance of all genes and second, incorporating the transcriptomic consequences of genotype variation to predict individual abundance on a subject-by-subject basis. We demonstrate that peaBrain accounts for the majority (>50%) of variance observed in mean transcript abundance across most tissues and outperforms regularized linear models in predicting the consequences of individual genotype variation. We highlight the validity of the peaBrain model by calculating non-coding impact scores that correlate with nucleotide evolutionary constraint that are also predictive of disease-associated variation and allele-specific transcription factor binding. We further show how these tissue-specific peaBrain scores can be leveraged to pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals. We subsequently: (a) derive continuous dense embeddings of genes for downstream applications; (b) highlight the utility of the model in predicting transcriptomic impact of small molecules and shRNA (on par with in vitro experimental replication of external test sets); (c) explore how peaBrain can be used to model difficult-to-study processes (such as neural induction); and (d) identify putatively functional eQTLs that are missed by high-throughput experimental approaches.
first_indexed	2024-12-12T01:26:46Z
format	Article
id	doaj.art-10500f028c1b443d89ceff31e3b9078d
institution	Directory Open Access Journal
issn	1553-734X 1553-7358
language	English
last_indexed	2024-12-12T01:26:46Z
publishDate	2022-04-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS Computational Biology
spelling	doaj.art-10500f028c1b443d89ceff31e3b9078d2022-12-22T00:43:04ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-04-01184e101002810.1371/journal.pcbi.1010028A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.Moustafa AbdallaMohamed AbdallaGenome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to assist in this challenge. Our computational framework, peaBrain, models the transcriptional machinery of a tissue as a two-stage process: first, predicting the mean tissue specific abundance of all genes and second, incorporating the transcriptomic consequences of genotype variation to predict individual abundance on a subject-by-subject basis. We demonstrate that peaBrain accounts for the majority (>50%) of variance observed in mean transcript abundance across most tissues and outperforms regularized linear models in predicting the consequences of individual genotype variation. We highlight the validity of the peaBrain model by calculating non-coding impact scores that correlate with nucleotide evolutionary constraint that are also predictive of disease-associated variation and allele-specific transcription factor binding. We further show how these tissue-specific peaBrain scores can be leveraged to pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals. We subsequently: (a) derive continuous dense embeddings of genes for downstream applications; (b) highlight the utility of the model in predicting transcriptomic impact of small molecules and shRNA (on par with in vitro experimental replication of external test sets); (c) explore how peaBrain can be used to model difficult-to-study processes (such as neural induction); and (d) identify putatively functional eQTLs that are missed by high-throughput experimental approaches.https://doi.org/10.1371/journal.pcbi.1010028
spellingShingle	Moustafa Abdalla Mohamed Abdalla A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules. PLoS Computational Biology
title	A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.
title_full	A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.
title_fullStr	A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.
title_full_unstemmed	A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.
title_short	A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.
title_sort	general framework for predicting the transcriptomic consequences of non coding variation and small molecules
url	https://doi.org/10.1371/journal.pcbi.1010028
work_keys_str_mv	AT moustafaabdalla ageneralframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules AT mohamedabdalla ageneralframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules AT moustafaabdalla generalframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules AT mohamedabdalla generalframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules

A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules.

Similar Items