Machine Aided Biological Discovery and Design

Advances in biotechnology and the life sciences are primarily driven by biologists conducting rigorous experimentation. However, biology is often too complex – with intractable combinatorial search spaces and functional landscapes – to comprehensively explore, understand, and engineer via iterative...

Full description

Bibliographic Details
Main Author: Saksena, Sachit Dinesh
Other Authors: Gifford, David K.
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/147252
_version_ 1811081758907564032
author Saksena, Sachit Dinesh
author2 Gifford, David K.
author_facet Gifford, David K.
Saksena, Sachit Dinesh
author_sort Saksena, Sachit Dinesh
collection MIT
description Advances in biotechnology and the life sciences are primarily driven by biologists conducting rigorous experimentation. However, biology is often too complex – with intractable combinatorial search spaces and functional landscapes – to comprehensively explore, understand, and engineer via iterative biological experimentation. Next-generation sequencing technologies have made it possible to measure biology in high-throughput, giving observational insight into these complexities. Further, in recent years, it has become possible to both manipulate biological systems with fine-grained control and directly synthesize large libraries of DNA molecules with specified sequences, providing unprecedented ability to engineer biology. We explore the thesis that computational methods that are built with experimental considerations and trained on carefully selected high-throughput experimental data can drive advances in the life sciences by making accurate predictions that can then be used to iteratively generate hypotheses and design biological sequences for further experimental validation. To test our thesis about the value of computational methods we introduce and apply computational approaches for modeling cellular differentiation trajectories, identifying non-specific antibodies, and designing diverse libraries of biological sequences that reflect desired objectives. First, we introduce a generative machine learning model for inferring cellular developmental landscapes from cross-sectional sequencing of in vitro differentiation time-series. We validate this model with ground-truth experimental lineage tracing experiments, and we show its ability to conduct in silico simulations of cellular differentiation trajectories with perturbations. Next, we present a computational framework for using sequencing data from therapeutic discovery campaigns to identify nonspecific antibody therapeutics in large candidate pools. We show that this approach bypasses and outperforms costly combinatorial affinity selection experiments and allows the use of only single-target selection data to identify pairwise nonspecificity. Finally, we introduce an algorithm for the rational design of high diversity synthetic antibody libraries using machine learning models and stochastic optimization. We show how this can be used to develop large libraries optimized for targets or developability characteristics leading to more promising candidates from affinity selection.
first_indexed 2024-09-23T11:52:01Z
format Thesis
id mit-1721.1/147252
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T11:52:01Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1472522023-01-20T03:36:33Z Machine Aided Biological Discovery and Design Saksena, Sachit Dinesh Gifford, David K. Massachusetts Institute of Technology. Computational and Systems Biology Program Advances in biotechnology and the life sciences are primarily driven by biologists conducting rigorous experimentation. However, biology is often too complex – with intractable combinatorial search spaces and functional landscapes – to comprehensively explore, understand, and engineer via iterative biological experimentation. Next-generation sequencing technologies have made it possible to measure biology in high-throughput, giving observational insight into these complexities. Further, in recent years, it has become possible to both manipulate biological systems with fine-grained control and directly synthesize large libraries of DNA molecules with specified sequences, providing unprecedented ability to engineer biology. We explore the thesis that computational methods that are built with experimental considerations and trained on carefully selected high-throughput experimental data can drive advances in the life sciences by making accurate predictions that can then be used to iteratively generate hypotheses and design biological sequences for further experimental validation. To test our thesis about the value of computational methods we introduce and apply computational approaches for modeling cellular differentiation trajectories, identifying non-specific antibodies, and designing diverse libraries of biological sequences that reflect desired objectives. First, we introduce a generative machine learning model for inferring cellular developmental landscapes from cross-sectional sequencing of in vitro differentiation time-series. We validate this model with ground-truth experimental lineage tracing experiments, and we show its ability to conduct in silico simulations of cellular differentiation trajectories with perturbations. Next, we present a computational framework for using sequencing data from therapeutic discovery campaigns to identify nonspecific antibody therapeutics in large candidate pools. We show that this approach bypasses and outperforms costly combinatorial affinity selection experiments and allows the use of only single-target selection data to identify pairwise nonspecificity. Finally, we introduce an algorithm for the rational design of high diversity synthetic antibody libraries using machine learning models and stochastic optimization. We show how this can be used to develop large libraries optimized for targets or developability characteristics leading to more promising candidates from affinity selection. Ph.D. 2023-01-19T18:40:30Z 2023-01-19T18:40:30Z 2022-09 2022-10-07T20:55:15.698Z Thesis https://hdl.handle.net/1721.1/147252 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Saksena, Sachit Dinesh
Machine Aided Biological Discovery and Design
title Machine Aided Biological Discovery and Design
title_full Machine Aided Biological Discovery and Design
title_fullStr Machine Aided Biological Discovery and Design
title_full_unstemmed Machine Aided Biological Discovery and Design
title_short Machine Aided Biological Discovery and Design
title_sort machine aided biological discovery and design
url https://hdl.handle.net/1721.1/147252
work_keys_str_mv AT saksenasachitdinesh machineaidedbiologicaldiscoveryanddesign