Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE

Transcription factors are crucial regulators of gene expression. Accurate quantitative definition of their intrinsic DNA binding preferences is critical to understanding their biological function. High-throughput in vitro technology has recently been used to deeply probe the DNA binding specificity...

Full description

Bibliographic Details
Main Authors: Todd R Riley, Allan Lazarovici, Richard S Mann, Harmen J Bussemaker
Format: Article
Language:English
Published: eLife Sciences Publications Ltd 2015-12-01
Series:eLife
Subjects:
Online Access:https://elifesciences.org/articles/06397
_version_ 1828375807245942784
author Todd R Riley
Allan Lazarovici
Richard S Mann
Harmen J Bussemaker
author_facet Todd R Riley
Allan Lazarovici
Richard S Mann
Harmen J Bussemaker
author_sort Todd R Riley
collection DOAJ
description Transcription factors are crucial regulators of gene expression. Accurate quantitative definition of their intrinsic DNA binding preferences is critical to understanding their biological function. High-throughput in vitro technology has recently been used to deeply probe the DNA binding specificity of hundreds of eukaryotic transcription factors, yet algorithms for analyzing such data have not yet fully matured. Here, we present a general framework (FeatureREDUCE) for building sequence-to-affinity models based on a biophysically interpretable and extensible model of protein-DNA interaction that can account for dependencies between nucleotides within the binding interface or multiple modes of binding. When training on protein binding microarray (PBM) data, we use robust regression and modeling of technology-specific biases to infer specificity models of unprecedented accuracy and precision. We provide quantitative validation of our results by comparing to gold-standard data when available.
first_indexed 2024-04-14T07:51:55Z
format Article
id doaj.art-9872cee3b58d49de94f316757eaaa0d7
institution Directory Open Access Journal
issn 2050-084X
language English
last_indexed 2024-04-14T07:51:55Z
publishDate 2015-12-01
publisher eLife Sciences Publications Ltd
record_format Article
series eLife
spelling doaj.art-9872cee3b58d49de94f316757eaaa0d72022-12-22T02:05:10ZengeLife Sciences Publications LtdeLife2050-084X2015-12-01410.7554/eLife.06397Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCETodd R Riley0Allan Lazarovici1Richard S Mann2Harmen J Bussemaker3Department of Biological Sciences, Columbia University, New York, United States; Department of Systems Biology, Columbia University, New York, United States; Department of Biology, University of Massachusetts Boston, Boston, United StatesDepartment of Biological Sciences, Columbia University, New York, United States; Department of Electrical Engineering, Columbia University, New York, United StatesDepartment of Systems Biology, Columbia University, New York, United States; Department of Biochemistry and Molecular Biophysics, Columbia University, New York, United StatesDepartment of Biological Sciences, Columbia University, New York, United States; Department of Systems Biology, Columbia University, New York, United StatesTranscription factors are crucial regulators of gene expression. Accurate quantitative definition of their intrinsic DNA binding preferences is critical to understanding their biological function. High-throughput in vitro technology has recently been used to deeply probe the DNA binding specificity of hundreds of eukaryotic transcription factors, yet algorithms for analyzing such data have not yet fully matured. Here, we present a general framework (FeatureREDUCE) for building sequence-to-affinity models based on a biophysically interpretable and extensible model of protein-DNA interaction that can account for dependencies between nucleotides within the binding interface or multiple modes of binding. When training on protein binding microarray (PBM) data, we use robust regression and modeling of technology-specific biases to infer specificity models of unprecedented accuracy and precision. We provide quantitative validation of our results by comparing to gold-standard data when available.https://elifesciences.org/articles/06397transcription factorprotein binding microarray technologybiophysical modelDNA binding specificity
spellingShingle Todd R Riley
Allan Lazarovici
Richard S Mann
Harmen J Bussemaker
Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE
eLife
transcription factor
protein binding microarray technology
biophysical model
DNA binding specificity
title Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE
title_full Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE
title_fullStr Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE
title_full_unstemmed Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE
title_short Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE
title_sort building accurate sequence to affinity models from high throughput in vitro protein dna binding data using featurereduce
topic transcription factor
protein binding microarray technology
biophysical model
DNA binding specificity
url https://elifesciences.org/articles/06397
work_keys_str_mv AT toddrriley buildingaccuratesequencetoaffinitymodelsfromhighthroughputinvitroproteindnabindingdatausingfeaturereduce
AT allanlazarovici buildingaccuratesequencetoaffinitymodelsfromhighthroughputinvitroproteindnabindingdatausingfeaturereduce
AT richardsmann buildingaccuratesequencetoaffinitymodelsfromhighthroughputinvitroproteindnabindingdatausingfeaturereduce
AT harmenjbussemaker buildingaccuratesequencetoaffinitymodelsfromhighthroughputinvitroproteindnabindingdatausingfeaturereduce