Learning "graph-mer" motifs that predict gene expression trajectories in development.

A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profi...

Full description

Bibliographic Details
Main Authors: Xuejing Li, Casandra Panea, Chris H Wiggins, Valerie Reinke, Christina Leslie
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2010-04-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC2861633?pdf=render
_version_ 1828300246319366144
author Xuejing Li
Casandra Panea
Chris H Wiggins
Valerie Reinke
Christina Leslie
author_facet Xuejing Li
Casandra Panea
Chris H Wiggins
Valerie Reinke
Christina Leslie
author_sort Xuejing Li
collection DOAJ
description A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs. Moreover, if many gene expression profiles in a data set are highly correlated, as in the case of whole organism developmental time series, it may be difficult to resolve fine-grained clusters in the first place. We present a predictive framework for modeling the natural flow of information, from promoter sequence to expression, to learn cis regulatory motifs and characterize gene expression patterns in developmental time courses. We introduce a cluster-free algorithm based on a graph-regularized version of partial least squares (PLS) regression to learn sequence patterns--represented by graphs of k-mers, or "graph-mers"--that predict gene expression trajectories. Applying the approach to wildtype germline development in Caenorhabditis elegans, we found that the first and second latent PLS factors mapped to expression profiles for oocyte and sperm genes, respectively. We extracted both known and novel motifs from the graph-mers associated to these germline-specific patterns, including novel CG-rich motifs specific to oocyte genes. We found evidence supporting the functional relevance of these putative regulatory elements through analysis of positional bias, motif conservation and in situ gene expression. This study demonstrates that our regression model can learn biologically meaningful latent structure and identify potentially functional motifs from subtle developmental time course expression data.
first_indexed 2024-04-13T13:06:41Z
format Article
id doaj.art-8f9b13f146d9491184cfa67bc4c4e84c
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-04-13T13:06:41Z
publishDate 2010-04-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-8f9b13f146d9491184cfa67bc4c4e84c2022-12-22T02:45:45ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582010-04-0164e100076110.1371/journal.pcbi.1000761Learning "graph-mer" motifs that predict gene expression trajectories in development.Xuejing LiCasandra PaneaChris H WigginsValerie ReinkeChristina LeslieA key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs. Moreover, if many gene expression profiles in a data set are highly correlated, as in the case of whole organism developmental time series, it may be difficult to resolve fine-grained clusters in the first place. We present a predictive framework for modeling the natural flow of information, from promoter sequence to expression, to learn cis regulatory motifs and characterize gene expression patterns in developmental time courses. We introduce a cluster-free algorithm based on a graph-regularized version of partial least squares (PLS) regression to learn sequence patterns--represented by graphs of k-mers, or "graph-mers"--that predict gene expression trajectories. Applying the approach to wildtype germline development in Caenorhabditis elegans, we found that the first and second latent PLS factors mapped to expression profiles for oocyte and sperm genes, respectively. We extracted both known and novel motifs from the graph-mers associated to these germline-specific patterns, including novel CG-rich motifs specific to oocyte genes. We found evidence supporting the functional relevance of these putative regulatory elements through analysis of positional bias, motif conservation and in situ gene expression. This study demonstrates that our regression model can learn biologically meaningful latent structure and identify potentially functional motifs from subtle developmental time course expression data.http://europepmc.org/articles/PMC2861633?pdf=render
spellingShingle Xuejing Li
Casandra Panea
Chris H Wiggins
Valerie Reinke
Christina Leslie
Learning "graph-mer" motifs that predict gene expression trajectories in development.
PLoS Computational Biology
title Learning "graph-mer" motifs that predict gene expression trajectories in development.
title_full Learning "graph-mer" motifs that predict gene expression trajectories in development.
title_fullStr Learning "graph-mer" motifs that predict gene expression trajectories in development.
title_full_unstemmed Learning "graph-mer" motifs that predict gene expression trajectories in development.
title_short Learning "graph-mer" motifs that predict gene expression trajectories in development.
title_sort learning graph mer motifs that predict gene expression trajectories in development
url http://europepmc.org/articles/PMC2861633?pdf=render
work_keys_str_mv AT xuejingli learninggraphmermotifsthatpredictgeneexpressiontrajectoriesindevelopment
AT casandrapanea learninggraphmermotifsthatpredictgeneexpressiontrajectoriesindevelopment
AT chrishwiggins learninggraphmermotifsthatpredictgeneexpressiontrajectoriesindevelopment
AT valeriereinke learninggraphmermotifsthatpredictgeneexpressiontrajectoriesindevelopment
AT christinaleslie learninggraphmermotifsthatpredictgeneexpressiontrajectoriesindevelopment