An unsupervised deep learning framework for predicting human essential genes from population and functional genomic data

Abstract Background The ability to accurately predict essential genes intolerant to loss-of-function (LOF) mutations can dramatically improve the identification of disease-associated genes. Recently, there have been numerous computational methods developed to predict human essential genes from popul...

Full description

Bibliographic Details
Main Authors: Troy M. LaPolice, Yi-Fei Huang
Format: Article
Language:English
Published: BMC 2023-09-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-023-05481-z
_version_ 1797556070989692928
author Troy M. LaPolice
Yi-Fei Huang
author_facet Troy M. LaPolice
Yi-Fei Huang
author_sort Troy M. LaPolice
collection DOAJ
description Abstract Background The ability to accurately predict essential genes intolerant to loss-of-function (LOF) mutations can dramatically improve the identification of disease-associated genes. Recently, there have been numerous computational methods developed to predict human essential genes from population genomic data. While the existing methods are highly predictive of essential genes of long length, they have limited power in pinpointing short essential genes due to the sparsity of polymorphisms in the human genome. Results Motivated by the premise that population and functional genomic data may provide complementary evidence for gene essentiality, here we present an evolution-based deep learning model, DeepLOF, to predict essential genes in an unsupervised manner. Unlike previous population genetic methods, DeepLOF utilizes a novel deep learning framework to integrate both population and functional genomic data, allowing us to pinpoint short essential genes that can hardly be predicted from population genomic data alone. Compared with previous methods, DeepLOF shows unmatched performance in predicting ClinGen haploinsufficient genes, mouse essential genes, and essential genes in human cell lines. Notably, at a false positive rate of 5%, DeepLOF detects 50% more ClinGen haploinsufficient genes than previous methods. Furthermore, DeepLOF discovers 109 novel essential genes that are too short to be identified by previous methods. Conclusion The predictive power of DeepLOF shows that it is a compelling computational method to aid in the discovery of essential genes.
first_indexed 2024-03-10T16:56:32Z
format Article
id doaj.art-111a54c0ab144522abec3097ab941e6c
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-03-10T16:56:32Z
publishDate 2023-09-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-111a54c0ab144522abec3097ab941e6c2023-11-20T11:06:36ZengBMCBMC Bioinformatics1471-21052023-09-0124112110.1186/s12859-023-05481-zAn unsupervised deep learning framework for predicting human essential genes from population and functional genomic dataTroy M. LaPolice0Yi-Fei Huang1Department of Biology, Pennsylvania State UniversityDepartment of Biology, Pennsylvania State UniversityAbstract Background The ability to accurately predict essential genes intolerant to loss-of-function (LOF) mutations can dramatically improve the identification of disease-associated genes. Recently, there have been numerous computational methods developed to predict human essential genes from population genomic data. While the existing methods are highly predictive of essential genes of long length, they have limited power in pinpointing short essential genes due to the sparsity of polymorphisms in the human genome. Results Motivated by the premise that population and functional genomic data may provide complementary evidence for gene essentiality, here we present an evolution-based deep learning model, DeepLOF, to predict essential genes in an unsupervised manner. Unlike previous population genetic methods, DeepLOF utilizes a novel deep learning framework to integrate both population and functional genomic data, allowing us to pinpoint short essential genes that can hardly be predicted from population genomic data alone. Compared with previous methods, DeepLOF shows unmatched performance in predicting ClinGen haploinsufficient genes, mouse essential genes, and essential genes in human cell lines. Notably, at a false positive rate of 5%, DeepLOF detects 50% more ClinGen haploinsufficient genes than previous methods. Furthermore, DeepLOF discovers 109 novel essential genes that are too short to be identified by previous methods. Conclusion The predictive power of DeepLOF shows that it is a compelling computational method to aid in the discovery of essential genes.https://doi.org/10.1186/s12859-023-05481-zDeep LearningUnsupervisedEssential GenesLoss of Function IntolerancePopulation GenomicsFunctional Genomics
spellingShingle Troy M. LaPolice
Yi-Fei Huang
An unsupervised deep learning framework for predicting human essential genes from population and functional genomic data
BMC Bioinformatics
Deep Learning
Unsupervised
Essential Genes
Loss of Function Intolerance
Population Genomics
Functional Genomics
title An unsupervised deep learning framework for predicting human essential genes from population and functional genomic data
title_full An unsupervised deep learning framework for predicting human essential genes from population and functional genomic data
title_fullStr An unsupervised deep learning framework for predicting human essential genes from population and functional genomic data
title_full_unstemmed An unsupervised deep learning framework for predicting human essential genes from population and functional genomic data
title_short An unsupervised deep learning framework for predicting human essential genes from population and functional genomic data
title_sort unsupervised deep learning framework for predicting human essential genes from population and functional genomic data
topic Deep Learning
Unsupervised
Essential Genes
Loss of Function Intolerance
Population Genomics
Functional Genomics
url https://doi.org/10.1186/s12859-023-05481-z
work_keys_str_mv AT troymlapolice anunsuperviseddeeplearningframeworkforpredictinghumanessentialgenesfrompopulationandfunctionalgenomicdata
AT yifeihuang anunsuperviseddeeplearningframeworkforpredictinghumanessentialgenesfrompopulationandfunctionalgenomicdata
AT troymlapolice unsuperviseddeeplearningframeworkforpredictinghumanessentialgenesfrompopulationandfunctionalgenomicdata
AT yifeihuang unsuperviseddeeplearningframeworkforpredictinghumanessentialgenesfrompopulationandfunctionalgenomicdata