Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes

Understanding the biological roles of all genes only through experimental methods is challenging. A computational approach with reliable interpretability is needed to infer the function of genes, particularly for non-coding RNAs. We have analyzed genomic features that are present across both coding...

Full description

Bibliographic Details
Main Authors: Omkar Chandra, Madhu Sharma, Neetesh Pandey, Indra Prakash Jha, Shreya Mishra, Say Li Kong, Vibhor Kumar
Format: Article
Language:English
Published: Elsevier 2023-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037023002453
_version_ 1797384048377593856
author Omkar Chandra
Madhu Sharma
Neetesh Pandey
Indra Prakash Jha
Shreya Mishra
Say Li Kong
Vibhor Kumar
author_facet Omkar Chandra
Madhu Sharma
Neetesh Pandey
Indra Prakash Jha
Shreya Mishra
Say Li Kong
Vibhor Kumar
author_sort Omkar Chandra
collection DOAJ
description Understanding the biological roles of all genes only through experimental methods is challenging. A computational approach with reliable interpretability is needed to infer the function of genes, particularly for non-coding RNAs. We have analyzed genomic features that are present across both coding and non-coding genes like transcription factor (TF) and cofactor ChIP-seq (823), histone modifications ChIP-seq (n = 621), cap analysis gene expression (CAGE) tags (n = 255), and DNase hypersensitivity profiles (n = 255) to predict ontology-based functions of genes. Our approach for gene function prediction was reliable (>90% balanced accuracy) for 486 gene-sets. PubMed abstract mining and CRISPR screens supported the inferred association of genes with biological functions, for which our method had high accuracy. Further analysis revealed that TF-binding patterns at promoters have high predictive strength for multiple functions. TF-binding patterns at the promoter add an unexplored dimension of explainable regulatory aspects of genes and their functions. Therefore, we performed a comprehensive analysis for the functional-specificity of TF-binding patterns at promoters and used them for clustering functions to reveal many latent groups of gene-sets involved in common major cellular processes. We also showed how our approach could be used to infer the functions of non-coding genes using the CRISPR screens of coding genes, which were validated using a long non-coding RNA CRISPR screen. Thus our results demonstrated the generality of our approach by using gene-sets from CRISPR screens. Overall, our approach opens an avenue for predicting the involvement of non-coding genes in various functions.
first_indexed 2024-03-08T21:29:50Z
format Article
id doaj.art-1935610f724b452289021da80c17e778
institution Directory Open Access Journal
issn 2001-0370
language English
last_indexed 2024-03-08T21:29:50Z
publishDate 2023-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj.art-1935610f724b452289021da80c17e7782023-12-21T07:31:46ZengElsevierComputational and Structural Biotechnology Journal2001-03702023-01-012135903603Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genesOmkar Chandra0Madhu Sharma1Neetesh Pandey2Indra Prakash Jha3Shreya Mishra4Say Li Kong5Vibhor Kumar6Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi, IndiaDepartment of Computational Biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi, IndiaDepartment of Computational Biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi, IndiaDepartment of Computational Biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi, IndiaDepartment of Computational Biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi, IndiaGenome Institute of Singapore, Agency for Science Technology and Research, Singapore, SingaporeDepartment of Computational Biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi, India; Corresponding author.Understanding the biological roles of all genes only through experimental methods is challenging. A computational approach with reliable interpretability is needed to infer the function of genes, particularly for non-coding RNAs. We have analyzed genomic features that are present across both coding and non-coding genes like transcription factor (TF) and cofactor ChIP-seq (823), histone modifications ChIP-seq (n = 621), cap analysis gene expression (CAGE) tags (n = 255), and DNase hypersensitivity profiles (n = 255) to predict ontology-based functions of genes. Our approach for gene function prediction was reliable (>90% balanced accuracy) for 486 gene-sets. PubMed abstract mining and CRISPR screens supported the inferred association of genes with biological functions, for which our method had high accuracy. Further analysis revealed that TF-binding patterns at promoters have high predictive strength for multiple functions. TF-binding patterns at the promoter add an unexplored dimension of explainable regulatory aspects of genes and their functions. Therefore, we performed a comprehensive analysis for the functional-specificity of TF-binding patterns at promoters and used them for clustering functions to reveal many latent groups of gene-sets involved in common major cellular processes. We also showed how our approach could be used to infer the functions of non-coding genes using the CRISPR screens of coding genes, which were validated using a long non-coding RNA CRISPR screen. Thus our results demonstrated the generality of our approach by using gene-sets from CRISPR screens. Overall, our approach opens an avenue for predicting the involvement of non-coding genes in various functions.http://www.sciencedirect.com/science/article/pii/S2001037023002453Functional genomicsLong noncoding RNA (long ncRNALncRNA)Gene regulationGeneral transcription factor (GTF)Epigenetics
spellingShingle Omkar Chandra
Madhu Sharma
Neetesh Pandey
Indra Prakash Jha
Shreya Mishra
Say Li Kong
Vibhor Kumar
Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes
Computational and Structural Biotechnology Journal
Functional genomics
Long noncoding RNA (long ncRNA
LncRNA)
Gene regulation
General transcription factor (GTF)
Epigenetics
title Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes
title_full Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes
title_fullStr Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes
title_full_unstemmed Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes
title_short Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes
title_sort patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non coding and coding genes
topic Functional genomics
Long noncoding RNA (long ncRNA
LncRNA)
Gene regulation
General transcription factor (GTF)
Epigenetics
url http://www.sciencedirect.com/science/article/pii/S2001037023002453
work_keys_str_mv AT omkarchandra patternsoftranscriptionfactorbindingandepigenomeatpromotersallowinterpretablepredictabilityofmultiplefunctionsofnoncodingandcodinggenes
AT madhusharma patternsoftranscriptionfactorbindingandepigenomeatpromotersallowinterpretablepredictabilityofmultiplefunctionsofnoncodingandcodinggenes
AT neeteshpandey patternsoftranscriptionfactorbindingandepigenomeatpromotersallowinterpretablepredictabilityofmultiplefunctionsofnoncodingandcodinggenes
AT indraprakashjha patternsoftranscriptionfactorbindingandepigenomeatpromotersallowinterpretablepredictabilityofmultiplefunctionsofnoncodingandcodinggenes
AT shreyamishra patternsoftranscriptionfactorbindingandepigenomeatpromotersallowinterpretablepredictabilityofmultiplefunctionsofnoncodingandcodinggenes
AT saylikong patternsoftranscriptionfactorbindingandepigenomeatpromotersallowinterpretablepredictabilityofmultiplefunctionsofnoncodingandcodinggenes
AT vibhorkumar patternsoftranscriptionfactorbindingandepigenomeatpromotersallowinterpretablepredictabilityofmultiplefunctionsofnoncodingandcodinggenes