Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns

Abstract Background The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been succes...

Full description

Bibliographic Details
Main Authors: Maria Osmala, Harri Lähdesmäki
Format: Article
Language:English
Published: BMC 2020-07-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-020-03621-3
_version_ 1818528795038056448
author Maria Osmala
Harri Lähdesmäki
author_facet Maria Osmala
Harri Lähdesmäki
author_sort Maria Osmala
collection DOAJ
description Abstract Background The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently. Results In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods. Conclusion PREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies.
first_indexed 2024-12-11T06:54:39Z
format Article
id doaj.art-945feed892c642629ee4a16ab6276957
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-11T06:54:39Z
publishDate 2020-07-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-945feed892c642629ee4a16ab62769572022-12-22T01:16:48ZengBMCBMC Bioinformatics1471-21052020-07-0121113710.1186/s12859-020-03621-3Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patternsMaria Osmala0Harri Lähdesmäki1Department of Computer Science, Aalto UniversityDepartment of Computer Science, Aalto UniversityAbstract Background The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently. Results In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods. Conclusion PREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies.http://link.springer.com/article/10.1186/s12859-020-03621-3EnhancerProbabilistic modellingClassifierChIP-seqCoverage pattern
spellingShingle Maria Osmala
Harri Lähdesmäki
Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
BMC Bioinformatics
Enhancer
Probabilistic modelling
Classifier
ChIP-seq
Coverage pattern
title Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_full Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_fullStr Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_full_unstemmed Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_short Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_sort enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
topic Enhancer
Probabilistic modelling
Classifier
ChIP-seq
Coverage pattern
url http://link.springer.com/article/10.1186/s12859-020-03621-3
work_keys_str_mv AT mariaosmala enhancerpredictioninthehumangenomebyprobabilisticmodellingofthechromatinfeaturepatterns
AT harrilahdesmaki enhancerpredictioninthehumangenomebyprobabilisticmodellingofthechromatinfeaturepatterns