Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design

Abstract Peptide nucleic acids (PNAs) are potential antisense therapies for genetic, acquired, and viral diseases. Efficiently selecting candidate PNA sequences for synthesis and evaluation from a genome containing hundreds to thousands of options can be challenging. To facilitate this process, this...

Full description

Bibliographic Details
Main Authors: Chengxi Li, Genwei Zhang, Somesh Mohapatra, Alex J. Callahan, Andrei Loas, Rafael Gómez‐Bombarelli, Bradley L. Pentelute
Format: Article
Language:English
Published: Wiley 2022-12-01
Series:Advanced Science
Subjects:
Online Access:https://doi.org/10.1002/advs.202201988
_version_ 1811186206264786944
author Chengxi Li
Genwei Zhang
Somesh Mohapatra
Alex J. Callahan
Andrei Loas
Rafael Gómez‐Bombarelli
Bradley L. Pentelute
author_facet Chengxi Li
Genwei Zhang
Somesh Mohapatra
Alex J. Callahan
Andrei Loas
Rafael Gómez‐Bombarelli
Bradley L. Pentelute
author_sort Chengxi Li
collection DOAJ
description Abstract Peptide nucleic acids (PNAs) are potential antisense therapies for genetic, acquired, and viral diseases. Efficiently selecting candidate PNA sequences for synthesis and evaluation from a genome containing hundreds to thousands of options can be challenging. To facilitate this process, this work leverages machine learning (ML) algorithms and automated synthesis technology to predict PNA synthesis efficiency and guide rational PNA sequence design. The training data is collected from individual fluorenylmethyloxycarbonyl (Fmoc) deprotection reactions performed on a fully automated PNA synthesizer. The optimized ML model allows for 93% prediction accuracy and 0.97 Pearson's r. The predicted synthesis scores are validated to be correlated with the experimental high‐performance liquid chromatography (HPLC) crude purities (correlation coefficient R2 = 0.95). Furthermore, a general applicability of ML is demonstrated through designing synthetically accessible antisense PNA sequences from 102 315 predicted candidates targeting exon 44 of the human dystrophin gene, SARS‐CoV‐2, HIV, as well as selected genes associated with cardiovascular diseases, type II diabetes, and various cancers. Collectively, ML provides an accurate prediction of PNA synthesis quality and serves as a useful computational tool for informing PNA sequence design.
first_indexed 2024-04-11T13:42:45Z
format Article
id doaj.art-48f8867a3109422590dbda03779410d5
institution Directory Open Access Journal
issn 2198-3844
language English
last_indexed 2024-04-11T13:42:45Z
publishDate 2022-12-01
publisher Wiley
record_format Article
series Advanced Science
spelling doaj.art-48f8867a3109422590dbda03779410d52022-12-22T04:21:12ZengWileyAdvanced Science2198-38442022-12-01934n/an/a10.1002/advs.202201988Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence DesignChengxi Li0Genwei Zhang1Somesh Mohapatra2Alex J. Callahan3Andrei Loas4Rafael Gómez‐Bombarelli5Bradley L. Pentelute6Department of Chemistry Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USADepartment of Chemistry Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USADepartment of Materials Science and Engineering Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USADepartment of Chemistry Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USADepartment of Chemistry Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USADepartment of Materials Science and Engineering Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USADepartment of Chemistry Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USAAbstract Peptide nucleic acids (PNAs) are potential antisense therapies for genetic, acquired, and viral diseases. Efficiently selecting candidate PNA sequences for synthesis and evaluation from a genome containing hundreds to thousands of options can be challenging. To facilitate this process, this work leverages machine learning (ML) algorithms and automated synthesis technology to predict PNA synthesis efficiency and guide rational PNA sequence design. The training data is collected from individual fluorenylmethyloxycarbonyl (Fmoc) deprotection reactions performed on a fully automated PNA synthesizer. The optimized ML model allows for 93% prediction accuracy and 0.97 Pearson's r. The predicted synthesis scores are validated to be correlated with the experimental high‐performance liquid chromatography (HPLC) crude purities (correlation coefficient R2 = 0.95). Furthermore, a general applicability of ML is demonstrated through designing synthetically accessible antisense PNA sequences from 102 315 predicted candidates targeting exon 44 of the human dystrophin gene, SARS‐CoV‐2, HIV, as well as selected genes associated with cardiovascular diseases, type II diabetes, and various cancers. Collectively, ML provides an accurate prediction of PNA synthesis quality and serves as a useful computational tool for informing PNA sequence design.https://doi.org/10.1002/advs.202201988automated synthesisdrug designmachine learningpeptide nucleic acidyield prediction
spellingShingle Chengxi Li
Genwei Zhang
Somesh Mohapatra
Alex J. Callahan
Andrei Loas
Rafael Gómez‐Bombarelli
Bradley L. Pentelute
Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design
Advanced Science
automated synthesis
drug design
machine learning
peptide nucleic acid
yield prediction
title Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design
title_full Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design
title_fullStr Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design
title_full_unstemmed Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design
title_short Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design
title_sort machine learning guides peptide nucleic acid flow synthesis and sequence design
topic automated synthesis
drug design
machine learning
peptide nucleic acid
yield prediction
url https://doi.org/10.1002/advs.202201988
work_keys_str_mv AT chengxili machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign
AT genweizhang machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign
AT someshmohapatra machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign
AT alexjcallahan machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign
AT andreiloas machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign
AT rafaelgomezbombarelli machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign
AT bradleylpentelute machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign