Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq Experiments

Computational genomics aim at supporting the discovery of how the functionality of the genome of the organism under study is affected both by its own sequence and structure, and by the network of interaction between this genome and different biological or physical factors. In this work, we focus on...

Full description

Bibliographic Details
Main Authors: Ginés Almagro-Hernández, Juana-María Vivo, Manuel Franco, Jesualdo Tomás Fernández-Breis
Format: Article
Language:English
Published: MDPI AG 2021-12-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/9/24/3239
_version_ 1797502630094700544
author Ginés Almagro-Hernández
Juana-María Vivo
Manuel Franco
Jesualdo Tomás Fernández-Breis
author_facet Ginés Almagro-Hernández
Juana-María Vivo
Manuel Franco
Jesualdo Tomás Fernández-Breis
author_sort Ginés Almagro-Hernández
collection DOAJ
description Computational genomics aim at supporting the discovery of how the functionality of the genome of the organism under study is affected both by its own sequence and structure, and by the network of interaction between this genome and different biological or physical factors. In this work, we focus on the analysis of ChIP-seq data, for which many methods have been proposed in the recent years. However, to the best of our knowledge, those methods lack an appropriate mathematical formalism. We have developed a method based on multivariate models for the analysis of the set of peaks obtained from a ChIP-seq experiment. This method can be used to characterize an individual experiment and to compare different experiments regardless of where and when they were conducted. The method is based on a multivariate hypergeometric distribution, which fits the complexity of the biological data and is better suited to deal with the uncertainty generated in this type of experiments than the dichotomous models used by the state of the art methods. We have validated this method with <i>Arabidopsis thaliana</i> datasets obtained from the Remap2020 database, obtaining results in accordance with the original study of these samples. Our work shows a novel way for analyzing ChIP-seq data.
first_indexed 2024-03-10T03:36:55Z
format Article
id doaj.art-29771a5169314743a33ca85f45b65e71
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-10T03:36:55Z
publishDate 2021-12-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-29771a5169314743a33ca85f45b65e712023-11-23T09:26:20ZengMDPI AGMathematics2227-73902021-12-01924323910.3390/math9243239Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq ExperimentsGinés Almagro-Hernández0Juana-María Vivo1Manuel Franco2Jesualdo Tomás Fernández-Breis3Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, 30100 Murcia, SpainInstituto Murciano de Investigación Biosanitaria (IMIB-Arrixaca), 30120 Murcia, SpainInstituto Murciano de Investigación Biosanitaria (IMIB-Arrixaca), 30120 Murcia, SpainDepartamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, 30100 Murcia, SpainComputational genomics aim at supporting the discovery of how the functionality of the genome of the organism under study is affected both by its own sequence and structure, and by the network of interaction between this genome and different biological or physical factors. In this work, we focus on the analysis of ChIP-seq data, for which many methods have been proposed in the recent years. However, to the best of our knowledge, those methods lack an appropriate mathematical formalism. We have developed a method based on multivariate models for the analysis of the set of peaks obtained from a ChIP-seq experiment. This method can be used to characterize an individual experiment and to compare different experiments regardless of where and when they were conducted. The method is based on a multivariate hypergeometric distribution, which fits the complexity of the biological data and is better suited to deal with the uncertainty generated in this type of experiments than the dichotomous models used by the state of the art methods. We have validated this method with <i>Arabidopsis thaliana</i> datasets obtained from the Remap2020 database, obtaining results in accordance with the original study of these samples. Our work shows a novel way for analyzing ChIP-seq data.https://www.mdpi.com/2227-7390/9/24/3239bioinformaticscomputational genomicsChIP-seq experimentprotein binding functional regionsmultivariate hypergeometric distribution
spellingShingle Ginés Almagro-Hernández
Juana-María Vivo
Manuel Franco
Jesualdo Tomás Fernández-Breis
Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq Experiments
Mathematics
bioinformatics
computational genomics
ChIP-seq experiment
protein binding functional regions
multivariate hypergeometric distribution
title Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq Experiments
title_full Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq Experiments
title_fullStr Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq Experiments
title_full_unstemmed Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq Experiments
title_short Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq Experiments
title_sort analysing the protein dna binding sites in i arabidopsis thaliana i from chip seq experiments
topic bioinformatics
computational genomics
ChIP-seq experiment
protein binding functional regions
multivariate hypergeometric distribution
url https://www.mdpi.com/2227-7390/9/24/3239
work_keys_str_mv AT ginesalmagrohernandez analysingtheproteindnabindingsitesiniarabidopsisthalianaifromchipseqexperiments
AT juanamariavivo analysingtheproteindnabindingsitesiniarabidopsisthalianaifromchipseqexperiments
AT manuelfranco analysingtheproteindnabindingsitesiniarabidopsisthalianaifromchipseqexperiments
AT jesualdotomasfernandezbreis analysingtheproteindnabindingsitesiniarabidopsisthalianaifromchipseqexperiments