Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes
Understanding the biological roles of all genes only through experimental methods is challenging. A computational approach with reliable interpretability is needed to infer the function of genes, particularly for non-coding RNAs. We have analyzed genomic features that are present across both coding...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-01-01
|
Series: | Computational and Structural Biotechnology Journal |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2001037023002453 |
_version_ | 1797384048377593856 |
---|---|
author | Omkar Chandra Madhu Sharma Neetesh Pandey Indra Prakash Jha Shreya Mishra Say Li Kong Vibhor Kumar |
author_facet | Omkar Chandra Madhu Sharma Neetesh Pandey Indra Prakash Jha Shreya Mishra Say Li Kong Vibhor Kumar |
author_sort | Omkar Chandra |
collection | DOAJ |
description | Understanding the biological roles of all genes only through experimental methods is challenging. A computational approach with reliable interpretability is needed to infer the function of genes, particularly for non-coding RNAs. We have analyzed genomic features that are present across both coding and non-coding genes like transcription factor (TF) and cofactor ChIP-seq (823), histone modifications ChIP-seq (n = 621), cap analysis gene expression (CAGE) tags (n = 255), and DNase hypersensitivity profiles (n = 255) to predict ontology-based functions of genes. Our approach for gene function prediction was reliable (>90% balanced accuracy) for 486 gene-sets. PubMed abstract mining and CRISPR screens supported the inferred association of genes with biological functions, for which our method had high accuracy. Further analysis revealed that TF-binding patterns at promoters have high predictive strength for multiple functions. TF-binding patterns at the promoter add an unexplored dimension of explainable regulatory aspects of genes and their functions. Therefore, we performed a comprehensive analysis for the functional-specificity of TF-binding patterns at promoters and used them for clustering functions to reveal many latent groups of gene-sets involved in common major cellular processes. We also showed how our approach could be used to infer the functions of non-coding genes using the CRISPR screens of coding genes, which were validated using a long non-coding RNA CRISPR screen. Thus our results demonstrated the generality of our approach by using gene-sets from CRISPR screens. Overall, our approach opens an avenue for predicting the involvement of non-coding genes in various functions. |
first_indexed | 2024-03-08T21:29:50Z |
format | Article |
id | doaj.art-1935610f724b452289021da80c17e778 |
institution | Directory Open Access Journal |
issn | 2001-0370 |
language | English |
last_indexed | 2024-03-08T21:29:50Z |
publishDate | 2023-01-01 |
publisher | Elsevier |
record_format | Article |
series | Computational and Structural Biotechnology Journal |
spelling | doaj.art-1935610f724b452289021da80c17e7782023-12-21T07:31:46ZengElsevierComputational and Structural Biotechnology Journal2001-03702023-01-012135903603Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genesOmkar Chandra0Madhu Sharma1Neetesh Pandey2Indra Prakash Jha3Shreya Mishra4Say Li Kong5Vibhor Kumar6Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi, IndiaDepartment of Computational Biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi, IndiaDepartment of Computational Biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi, IndiaDepartment of Computational Biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi, IndiaDepartment of Computational Biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi, IndiaGenome Institute of Singapore, Agency for Science Technology and Research, Singapore, SingaporeDepartment of Computational Biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi, India; Corresponding author.Understanding the biological roles of all genes only through experimental methods is challenging. A computational approach with reliable interpretability is needed to infer the function of genes, particularly for non-coding RNAs. We have analyzed genomic features that are present across both coding and non-coding genes like transcription factor (TF) and cofactor ChIP-seq (823), histone modifications ChIP-seq (n = 621), cap analysis gene expression (CAGE) tags (n = 255), and DNase hypersensitivity profiles (n = 255) to predict ontology-based functions of genes. Our approach for gene function prediction was reliable (>90% balanced accuracy) for 486 gene-sets. PubMed abstract mining and CRISPR screens supported the inferred association of genes with biological functions, for which our method had high accuracy. Further analysis revealed that TF-binding patterns at promoters have high predictive strength for multiple functions. TF-binding patterns at the promoter add an unexplored dimension of explainable regulatory aspects of genes and their functions. Therefore, we performed a comprehensive analysis for the functional-specificity of TF-binding patterns at promoters and used them for clustering functions to reveal many latent groups of gene-sets involved in common major cellular processes. We also showed how our approach could be used to infer the functions of non-coding genes using the CRISPR screens of coding genes, which were validated using a long non-coding RNA CRISPR screen. Thus our results demonstrated the generality of our approach by using gene-sets from CRISPR screens. Overall, our approach opens an avenue for predicting the involvement of non-coding genes in various functions.http://www.sciencedirect.com/science/article/pii/S2001037023002453Functional genomicsLong noncoding RNA (long ncRNALncRNA)Gene regulationGeneral transcription factor (GTF)Epigenetics |
spellingShingle | Omkar Chandra Madhu Sharma Neetesh Pandey Indra Prakash Jha Shreya Mishra Say Li Kong Vibhor Kumar Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes Computational and Structural Biotechnology Journal Functional genomics Long noncoding RNA (long ncRNA LncRNA) Gene regulation General transcription factor (GTF) Epigenetics |
title | Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes |
title_full | Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes |
title_fullStr | Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes |
title_full_unstemmed | Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes |
title_short | Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes |
title_sort | patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non coding and coding genes |
topic | Functional genomics Long noncoding RNA (long ncRNA LncRNA) Gene regulation General transcription factor (GTF) Epigenetics |
url | http://www.sciencedirect.com/science/article/pii/S2001037023002453 |
work_keys_str_mv | AT omkarchandra patternsoftranscriptionfactorbindingandepigenomeatpromotersallowinterpretablepredictabilityofmultiplefunctionsofnoncodingandcodinggenes AT madhusharma patternsoftranscriptionfactorbindingandepigenomeatpromotersallowinterpretablepredictabilityofmultiplefunctionsofnoncodingandcodinggenes AT neeteshpandey patternsoftranscriptionfactorbindingandepigenomeatpromotersallowinterpretablepredictabilityofmultiplefunctionsofnoncodingandcodinggenes AT indraprakashjha patternsoftranscriptionfactorbindingandepigenomeatpromotersallowinterpretablepredictabilityofmultiplefunctionsofnoncodingandcodinggenes AT shreyamishra patternsoftranscriptionfactorbindingandepigenomeatpromotersallowinterpretablepredictabilityofmultiplefunctionsofnoncodingandcodinggenes AT saylikong patternsoftranscriptionfactorbindingandepigenomeatpromotersallowinterpretablepredictabilityofmultiplefunctionsofnoncodingandcodinggenes AT vibhorkumar patternsoftranscriptionfactorbindingandepigenomeatpromotersallowinterpretablepredictabilityofmultiplefunctionsofnoncodingandcodinggenes |