Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE

Transcription factors are crucial regulators of gene expression. Accurate quantitative definition of their intrinsic DNA binding preferences is critical to understanding their biological function. High-throughput in vitro technology has recently been used to deeply probe the DNA binding specificity...

Full description

Bibliographic Details
Main Authors:	Todd R Riley, Allan Lazarovici, Richard S Mann, Harmen J Bussemaker
Format:	Article
Language:	English
Published:	eLife Sciences Publications Ltd 2015-12-01
Series:	eLife
Subjects:	transcription factor protein binding microarray technology biophysical model DNA binding specificity
Online Access:	https://elifesciences.org/articles/06397

_version_	1828375807245942784
author	Todd R Riley Allan Lazarovici Richard S Mann Harmen J Bussemaker
author_facet	Todd R Riley Allan Lazarovici Richard S Mann Harmen J Bussemaker
author_sort	Todd R Riley
collection	DOAJ
description	Transcription factors are crucial regulators of gene expression. Accurate quantitative definition of their intrinsic DNA binding preferences is critical to understanding their biological function. High-throughput in vitro technology has recently been used to deeply probe the DNA binding specificity of hundreds of eukaryotic transcription factors, yet algorithms for analyzing such data have not yet fully matured. Here, we present a general framework (FeatureREDUCE) for building sequence-to-affinity models based on a biophysically interpretable and extensible model of protein-DNA interaction that can account for dependencies between nucleotides within the binding interface or multiple modes of binding. When training on protein binding microarray (PBM) data, we use robust regression and modeling of technology-specific biases to infer specificity models of unprecedented accuracy and precision. We provide quantitative validation of our results by comparing to gold-standard data when available.
first_indexed	2024-04-14T07:51:55Z
format	Article
id	doaj.art-9872cee3b58d49de94f316757eaaa0d7
institution	Directory Open Access Journal
issn	2050-084X
language	English
last_indexed	2024-04-14T07:51:55Z
publishDate	2015-12-01
publisher	eLife Sciences Publications Ltd
record_format	Article
series	eLife
spelling	doaj.art-9872cee3b58d49de94f316757eaaa0d72022-12-22T02:05:10ZengeLife Sciences Publications LtdeLife2050-084X2015-12-01410.7554/eLife.06397Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCETodd R Riley0Allan Lazarovici1Richard S Mann2Harmen J Bussemaker3Department of Biological Sciences, Columbia University, New York, United States; Department of Systems Biology, Columbia University, New York, United States; Department of Biology, University of Massachusetts Boston, Boston, United StatesDepartment of Biological Sciences, Columbia University, New York, United States; Department of Electrical Engineering, Columbia University, New York, United StatesDepartment of Systems Biology, Columbia University, New York, United States; Department of Biochemistry and Molecular Biophysics, Columbia University, New York, United StatesDepartment of Biological Sciences, Columbia University, New York, United States; Department of Systems Biology, Columbia University, New York, United StatesTranscription factors are crucial regulators of gene expression. Accurate quantitative definition of their intrinsic DNA binding preferences is critical to understanding their biological function. High-throughput in vitro technology has recently been used to deeply probe the DNA binding specificity of hundreds of eukaryotic transcription factors, yet algorithms for analyzing such data have not yet fully matured. Here, we present a general framework (FeatureREDUCE) for building sequence-to-affinity models based on a biophysically interpretable and extensible model of protein-DNA interaction that can account for dependencies between nucleotides within the binding interface or multiple modes of binding. When training on protein binding microarray (PBM) data, we use robust regression and modeling of technology-specific biases to infer specificity models of unprecedented accuracy and precision. We provide quantitative validation of our results by comparing to gold-standard data when available.https://elifesciences.org/articles/06397transcription factorprotein binding microarray technologybiophysical modelDNA binding specificity
spellingShingle	Todd R Riley Allan Lazarovici Richard S Mann Harmen J Bussemaker Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE eLife transcription factor protein binding microarray technology biophysical model DNA binding specificity
title	Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE
title_full	Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE
title_fullStr	Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE
title_full_unstemmed	Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE
title_short	Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE
title_sort	building accurate sequence to affinity models from high throughput in vitro protein dna binding data using featurereduce
topic	transcription factor protein binding microarray technology biophysical model DNA binding specificity
url	https://elifesciences.org/articles/06397
work_keys_str_mv	AT toddrriley buildingaccuratesequencetoaffinitymodelsfromhighthroughputinvitroproteindnabindingdatausingfeaturereduce AT allanlazarovici buildingaccuratesequencetoaffinitymodelsfromhighthroughputinvitroproteindnabindingdatausingfeaturereduce AT richardsmann buildingaccuratesequencetoaffinitymodelsfromhighthroughputinvitroproteindnabindingdatausingfeaturereduce AT harmenjbussemaker buildingaccuratesequencetoaffinitymodelsfromhighthroughputinvitroproteindnabindingdatausingfeaturereduce

Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE

Similar Items