seqgra: principled selection of neural network architectures for genomics prediction tasks

Abstract Motivation: Sequence models based on deep neural networks have achieved state-of-the-art performance on regulatory genomics prediction tasks, such as chromatin accessibility and transcription factor binding. But despite their high accuracy, their contributions to a mechanistic understandi...

Full description

Bibliographic Details
Main Authors:	Krismer, Konstantin, Hammelman, Jennifer, Gifford, David K
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format:	Article
Language:	English
Published:	Oxford University Press (OUP) 2022
Online Access:	https://hdl.handle.net/1721.1/143575

_version_	1826217204465532928
author	Krismer, Konstantin Hammelman, Jennifer Gifford, David K
author2	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Krismer, Konstantin Hammelman, Jennifer Gifford, David K
author_sort	Krismer, Konstantin
collection	MIT
description	Abstract Motivation: Sequence models based on deep neural networks have achieved state-of-the-art performance on regulatory genomics prediction tasks, such as chromatin accessibility and transcription factor binding. But despite their high accuracy, their contributions to a mechanistic understanding of the biology of regulatory elements is often hindered by the complexity of the predictive model and thus poor interpretability of its decision boundaries. To address this, we introduce seqgra, a deep learning pipeline that incorporates the rule-based simulation of biological sequence data and the training and evaluation of models, whose decision boundaries mirror the rules from the simulation process. Results: We show that seqgra can be used to (i) generate data under the assumption of a hypothesized model of genome regulation, (ii) identify neural network architectures capable of recovering the rules of said model and (iii) analyze a model’s predictive performance as a function of training set size and the complexity of the rules behind the simulated data. Availability and implementation: The source code of the seqgra package is hosted on GitHub (https://github.com/gif ford-lab/seqgra). seqgra is a pip-installable Python package. Extensive documentation can be found at https:// kkrismer.github.io/seqgra.
first_indexed	2024-09-23T16:59:41Z
format	Article
id	mit-1721.1/143575
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T16:59:41Z
publishDate	2022
publisher	Oxford University Press (OUP)
record_format	dspace
spelling	mit-1721.1/1435752023-02-13T21:03:45Z seqgra: principled selection of neural network architectures for genomics prediction tasks Krismer, Konstantin Hammelman, Jennifer Gifford, David K Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Biological Engineering Massachusetts Institute of Technology. Computational and Systems Biology Program Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Abstract Motivation: Sequence models based on deep neural networks have achieved state-of-the-art performance on regulatory genomics prediction tasks, such as chromatin accessibility and transcription factor binding. But despite their high accuracy, their contributions to a mechanistic understanding of the biology of regulatory elements is often hindered by the complexity of the predictive model and thus poor interpretability of its decision boundaries. To address this, we introduce seqgra, a deep learning pipeline that incorporates the rule-based simulation of biological sequence data and the training and evaluation of models, whose decision boundaries mirror the rules from the simulation process. Results: We show that seqgra can be used to (i) generate data under the assumption of a hypothesized model of genome regulation, (ii) identify neural network architectures capable of recovering the rules of said model and (iii) analyze a model’s predictive performance as a function of training set size and the complexity of the rules behind the simulated data. Availability and implementation: The source code of the seqgra package is hosted on GitHub (https://github.com/gif ford-lab/seqgra). seqgra is a pip-installable Python package. Extensive documentation can be found at https:// kkrismer.github.io/seqgra. 2022-06-28T16:33:56Z 2022-06-28T16:33:56Z 2022-04-28 2022-06-28T13:58:21Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/143575 Krismer, Konstantin, Hammelman, Jennifer and Gifford, David K. 2022. "seqgra: principled selection of neural network architectures for genomics prediction tasks." Bioinformatics, 38 (9). en 10.1093/bioinformatics/btac101 Bioinformatics Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/ application/pdf Oxford University Press (OUP) Oxford University Press
spellingShingle	Krismer, Konstantin Hammelman, Jennifer Gifford, David K seqgra: principled selection of neural network architectures for genomics prediction tasks
title	seqgra: principled selection of neural network architectures for genomics prediction tasks
title_full	seqgra: principled selection of neural network architectures for genomics prediction tasks
title_fullStr	seqgra: principled selection of neural network architectures for genomics prediction tasks
title_full_unstemmed	seqgra: principled selection of neural network architectures for genomics prediction tasks
title_short	seqgra: principled selection of neural network architectures for genomics prediction tasks
title_sort	seqgra principled selection of neural network architectures for genomics prediction tasks
url	https://hdl.handle.net/1721.1/143575
work_keys_str_mv	AT krismerkonstantin seqgraprincipledselectionofneuralnetworkarchitecturesforgenomicspredictiontasks AT hammelmanjennifer seqgraprincipledselectionofneuralnetworkarchitecturesforgenomicspredictiontasks AT gifforddavidk seqgraprincipledselectionofneuralnetworkarchitecturesforgenomicspredictiontasks

seqgra: principled selection of neural network architectures for genomics prediction tasks

Similar Items