Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design
Abstract Peptide nucleic acids (PNAs) are potential antisense therapies for genetic, acquired, and viral diseases. Efficiently selecting candidate PNA sequences for synthesis and evaluation from a genome containing hundreds to thousands of options can be challenging. To facilitate this process, this...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2022-12-01
|
Series: | Advanced Science |
Subjects: | |
Online Access: | https://doi.org/10.1002/advs.202201988 |
_version_ | 1811186206264786944 |
---|---|
author | Chengxi Li Genwei Zhang Somesh Mohapatra Alex J. Callahan Andrei Loas Rafael Gómez‐Bombarelli Bradley L. Pentelute |
author_facet | Chengxi Li Genwei Zhang Somesh Mohapatra Alex J. Callahan Andrei Loas Rafael Gómez‐Bombarelli Bradley L. Pentelute |
author_sort | Chengxi Li |
collection | DOAJ |
description | Abstract Peptide nucleic acids (PNAs) are potential antisense therapies for genetic, acquired, and viral diseases. Efficiently selecting candidate PNA sequences for synthesis and evaluation from a genome containing hundreds to thousands of options can be challenging. To facilitate this process, this work leverages machine learning (ML) algorithms and automated synthesis technology to predict PNA synthesis efficiency and guide rational PNA sequence design. The training data is collected from individual fluorenylmethyloxycarbonyl (Fmoc) deprotection reactions performed on a fully automated PNA synthesizer. The optimized ML model allows for 93% prediction accuracy and 0.97 Pearson's r. The predicted synthesis scores are validated to be correlated with the experimental high‐performance liquid chromatography (HPLC) crude purities (correlation coefficient R2 = 0.95). Furthermore, a general applicability of ML is demonstrated through designing synthetically accessible antisense PNA sequences from 102 315 predicted candidates targeting exon 44 of the human dystrophin gene, SARS‐CoV‐2, HIV, as well as selected genes associated with cardiovascular diseases, type II diabetes, and various cancers. Collectively, ML provides an accurate prediction of PNA synthesis quality and serves as a useful computational tool for informing PNA sequence design. |
first_indexed | 2024-04-11T13:42:45Z |
format | Article |
id | doaj.art-48f8867a3109422590dbda03779410d5 |
institution | Directory Open Access Journal |
issn | 2198-3844 |
language | English |
last_indexed | 2024-04-11T13:42:45Z |
publishDate | 2022-12-01 |
publisher | Wiley |
record_format | Article |
series | Advanced Science |
spelling | doaj.art-48f8867a3109422590dbda03779410d52022-12-22T04:21:12ZengWileyAdvanced Science2198-38442022-12-01934n/an/a10.1002/advs.202201988Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence DesignChengxi Li0Genwei Zhang1Somesh Mohapatra2Alex J. Callahan3Andrei Loas4Rafael Gómez‐Bombarelli5Bradley L. Pentelute6Department of Chemistry Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USADepartment of Chemistry Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USADepartment of Materials Science and Engineering Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USADepartment of Chemistry Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USADepartment of Chemistry Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USADepartment of Materials Science and Engineering Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USADepartment of Chemistry Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USAAbstract Peptide nucleic acids (PNAs) are potential antisense therapies for genetic, acquired, and viral diseases. Efficiently selecting candidate PNA sequences for synthesis and evaluation from a genome containing hundreds to thousands of options can be challenging. To facilitate this process, this work leverages machine learning (ML) algorithms and automated synthesis technology to predict PNA synthesis efficiency and guide rational PNA sequence design. The training data is collected from individual fluorenylmethyloxycarbonyl (Fmoc) deprotection reactions performed on a fully automated PNA synthesizer. The optimized ML model allows for 93% prediction accuracy and 0.97 Pearson's r. The predicted synthesis scores are validated to be correlated with the experimental high‐performance liquid chromatography (HPLC) crude purities (correlation coefficient R2 = 0.95). Furthermore, a general applicability of ML is demonstrated through designing synthetically accessible antisense PNA sequences from 102 315 predicted candidates targeting exon 44 of the human dystrophin gene, SARS‐CoV‐2, HIV, as well as selected genes associated with cardiovascular diseases, type II diabetes, and various cancers. Collectively, ML provides an accurate prediction of PNA synthesis quality and serves as a useful computational tool for informing PNA sequence design.https://doi.org/10.1002/advs.202201988automated synthesisdrug designmachine learningpeptide nucleic acidyield prediction |
spellingShingle | Chengxi Li Genwei Zhang Somesh Mohapatra Alex J. Callahan Andrei Loas Rafael Gómez‐Bombarelli Bradley L. Pentelute Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design Advanced Science automated synthesis drug design machine learning peptide nucleic acid yield prediction |
title | Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design |
title_full | Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design |
title_fullStr | Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design |
title_full_unstemmed | Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design |
title_short | Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design |
title_sort | machine learning guides peptide nucleic acid flow synthesis and sequence design |
topic | automated synthesis drug design machine learning peptide nucleic acid yield prediction |
url | https://doi.org/10.1002/advs.202201988 |
work_keys_str_mv | AT chengxili machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign AT genweizhang machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign AT someshmohapatra machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign AT alexjcallahan machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign AT andreiloas machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign AT rafaelgomezbombarelli machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign AT bradleylpentelute machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign |