Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq Experiments
Computational genomics aim at supporting the discovery of how the functionality of the genome of the organism under study is affected both by its own sequence and structure, and by the network of interaction between this genome and different biological or physical factors. In this work, we focus on...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-12-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/9/24/3239 |
_version_ | 1797502630094700544 |
---|---|
author | Ginés Almagro-Hernández Juana-María Vivo Manuel Franco Jesualdo Tomás Fernández-Breis |
author_facet | Ginés Almagro-Hernández Juana-María Vivo Manuel Franco Jesualdo Tomás Fernández-Breis |
author_sort | Ginés Almagro-Hernández |
collection | DOAJ |
description | Computational genomics aim at supporting the discovery of how the functionality of the genome of the organism under study is affected both by its own sequence and structure, and by the network of interaction between this genome and different biological or physical factors. In this work, we focus on the analysis of ChIP-seq data, for which many methods have been proposed in the recent years. However, to the best of our knowledge, those methods lack an appropriate mathematical formalism. We have developed a method based on multivariate models for the analysis of the set of peaks obtained from a ChIP-seq experiment. This method can be used to characterize an individual experiment and to compare different experiments regardless of where and when they were conducted. The method is based on a multivariate hypergeometric distribution, which fits the complexity of the biological data and is better suited to deal with the uncertainty generated in this type of experiments than the dichotomous models used by the state of the art methods. We have validated this method with <i>Arabidopsis thaliana</i> datasets obtained from the Remap2020 database, obtaining results in accordance with the original study of these samples. Our work shows a novel way for analyzing ChIP-seq data. |
first_indexed | 2024-03-10T03:36:55Z |
format | Article |
id | doaj.art-29771a5169314743a33ca85f45b65e71 |
institution | Directory Open Access Journal |
issn | 2227-7390 |
language | English |
last_indexed | 2024-03-10T03:36:55Z |
publishDate | 2021-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj.art-29771a5169314743a33ca85f45b65e712023-11-23T09:26:20ZengMDPI AGMathematics2227-73902021-12-01924323910.3390/math9243239Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq ExperimentsGinés Almagro-Hernández0Juana-María Vivo1Manuel Franco2Jesualdo Tomás Fernández-Breis3Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, 30100 Murcia, SpainInstituto Murciano de Investigación Biosanitaria (IMIB-Arrixaca), 30120 Murcia, SpainInstituto Murciano de Investigación Biosanitaria (IMIB-Arrixaca), 30120 Murcia, SpainDepartamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, 30100 Murcia, SpainComputational genomics aim at supporting the discovery of how the functionality of the genome of the organism under study is affected both by its own sequence and structure, and by the network of interaction between this genome and different biological or physical factors. In this work, we focus on the analysis of ChIP-seq data, for which many methods have been proposed in the recent years. However, to the best of our knowledge, those methods lack an appropriate mathematical formalism. We have developed a method based on multivariate models for the analysis of the set of peaks obtained from a ChIP-seq experiment. This method can be used to characterize an individual experiment and to compare different experiments regardless of where and when they were conducted. The method is based on a multivariate hypergeometric distribution, which fits the complexity of the biological data and is better suited to deal with the uncertainty generated in this type of experiments than the dichotomous models used by the state of the art methods. We have validated this method with <i>Arabidopsis thaliana</i> datasets obtained from the Remap2020 database, obtaining results in accordance with the original study of these samples. Our work shows a novel way for analyzing ChIP-seq data.https://www.mdpi.com/2227-7390/9/24/3239bioinformaticscomputational genomicsChIP-seq experimentprotein binding functional regionsmultivariate hypergeometric distribution |
spellingShingle | Ginés Almagro-Hernández Juana-María Vivo Manuel Franco Jesualdo Tomás Fernández-Breis Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq Experiments Mathematics bioinformatics computational genomics ChIP-seq experiment protein binding functional regions multivariate hypergeometric distribution |
title | Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq Experiments |
title_full | Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq Experiments |
title_fullStr | Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq Experiments |
title_full_unstemmed | Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq Experiments |
title_short | Analysing the Protein-DNA Binding Sites in <i>Arabidopsis thaliana</i> from ChIP-seq Experiments |
title_sort | analysing the protein dna binding sites in i arabidopsis thaliana i from chip seq experiments |
topic | bioinformatics computational genomics ChIP-seq experiment protein binding functional regions multivariate hypergeometric distribution |
url | https://www.mdpi.com/2227-7390/9/24/3239 |
work_keys_str_mv | AT ginesalmagrohernandez analysingtheproteindnabindingsitesiniarabidopsisthalianaifromchipseqexperiments AT juanamariavivo analysingtheproteindnabindingsitesiniarabidopsisthalianaifromchipseqexperiments AT manuelfranco analysingtheproteindnabindingsitesiniarabidopsisthalianaifromchipseqexperiments AT jesualdotomasfernandezbreis analysingtheproteindnabindingsitesiniarabidopsisthalianaifromchipseqexperiments |