A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs

<p>Abstract</p> <p>Background</p> <p>Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. Thes...

Full description

Bibliographic Details
Main Authors: Seitzer Phillip, Wilbanks Elizabeth G, Larsen David J, Facciotti Marc T
Format: Article
Language:English
Published: BMC 2012-11-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://www.biomedcentral.com/1471-2105/13/317
_version_ 1818060959832342528
author Seitzer Phillip
Wilbanks Elizabeth G
Larsen David J
Facciotti Marc T
author_facet Seitzer Phillip
Wilbanks Elizabeth G
Larsen David J
Facciotti Marc T
author_sort Seitzer Phillip
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research.</p> <p>Results</p> <p>We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature.</p> <p>Conclusions</p> <p>Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in <it>Escherichia coli</it>, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in <it>Saccharomyces cerevisiae,</it> we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in <it>Halobacterium</it> sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package at <url>http://www.bme.ucdavis.edu/facciotti/resources_data/software/</url>.</p>
first_indexed 2024-12-10T13:40:43Z
format Article
id doaj.art-ea4f8a46b13a494d98d877a522ebd4f0
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-10T13:40:43Z
publishDate 2012-11-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-ea4f8a46b13a494d98d877a522ebd4f02022-12-22T01:46:42ZengBMCBMC Bioinformatics1471-21052012-11-0113131710.1186/1471-2105-13-317A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifsSeitzer PhillipWilbanks Elizabeth GLarsen David JFacciotti Marc T<p>Abstract</p> <p>Background</p> <p>Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research.</p> <p>Results</p> <p>We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature.</p> <p>Conclusions</p> <p>Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in <it>Escherichia coli</it>, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in <it>Saccharomyces cerevisiae,</it> we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in <it>Halobacterium</it> sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package at <url>http://www.bme.ucdavis.edu/facciotti/resources_data/software/</url>.</p>http://www.biomedcentral.com/1471-2105/13/317MotifMonte CarloChIP-seqChIP-chipComparative genomicsMEMESTAMPTFB
spellingShingle Seitzer Phillip
Wilbanks Elizabeth G
Larsen David J
Facciotti Marc T
A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs
BMC Bioinformatics
Motif
Monte Carlo
ChIP-seq
ChIP-chip
Comparative genomics
MEME
STAMP
TFB
title A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs
title_full A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs
title_fullStr A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs
title_full_unstemmed A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs
title_short A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs
title_sort monte carlo based framework enhances the discovery and interpretation of regulatory sequence motifs
topic Motif
Monte Carlo
ChIP-seq
ChIP-chip
Comparative genomics
MEME
STAMP
TFB
url http://www.biomedcentral.com/1471-2105/13/317
work_keys_str_mv AT seitzerphillip amontecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs
AT wilbankselizabethg amontecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs
AT larsendavidj amontecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs
AT facciottimarct amontecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs
AT seitzerphillip montecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs
AT wilbankselizabethg montecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs
AT larsendavidj montecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs
AT facciottimarct montecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs