CSM-Toxin: A Web-Server for Predicting Protein Toxicity

Biologics are one of the most rapidly expanding classes of therapeutics, but can be associated with a range of toxic properties. In small-molecule drug development, early identification of potential toxicity led to a significant reduction in clinical trial failures, however we currently lack robust...

Full description

Bibliographic Details
Main Authors: Vladimir Morozov, Carlos H. M. Rodrigues, David B. Ascher
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Pharmaceutics
Subjects:
Online Access:https://www.mdpi.com/1999-4923/15/2/431
_version_ 1797618730241359872
author Vladimir Morozov
Carlos H. M. Rodrigues
David B. Ascher
author_facet Vladimir Morozov
Carlos H. M. Rodrigues
David B. Ascher
author_sort Vladimir Morozov
collection DOAJ
description Biologics are one of the most rapidly expanding classes of therapeutics, but can be associated with a range of toxic properties. In small-molecule drug development, early identification of potential toxicity led to a significant reduction in clinical trial failures, however we currently lack robust qualitative rules or predictive tools for peptide- and protein-based biologics. To address this, we have manually curated the largest set of high-quality experimental data on peptide and protein toxicities, and developed CSM-Toxin, a novel in-silico protein toxicity classifier, which relies solely on the protein primary sequence. Our approach encodes the protein sequence information using a deep learning natural languages model to understand “biological” language, where residues are treated as words and protein sequences as sentences. The CSM-Toxin was able to accurately identify peptides and proteins with potential toxicity, achieving an MCC of up to 0.66 across both cross-validation and multiple non-redundant blind tests, outperforming other methods and highlighting the robust and generalisable performance of our model. We strongly believe the CSM-Toxin will serve as a valuable platform to minimise potential toxicity in the biologic development pipeline. Our method is freely available as an easy-to-use webserver.
first_indexed 2024-03-11T08:16:36Z
format Article
id doaj.art-23980b1651b546c6b331a318d2ab02b2
institution Directory Open Access Journal
issn 1999-4923
language English
last_indexed 2024-03-11T08:16:36Z
publishDate 2023-01-01
publisher MDPI AG
record_format Article
series Pharmaceutics
spelling doaj.art-23980b1651b546c6b331a318d2ab02b22023-11-16T22:39:50ZengMDPI AGPharmaceutics1999-49232023-01-0115243110.3390/pharmaceutics15020431CSM-Toxin: A Web-Server for Predicting Protein ToxicityVladimir Morozov0Carlos H. M. Rodrigues1David B. Ascher2School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, AustraliaSchool of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, AustraliaSchool of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, AustraliaBiologics are one of the most rapidly expanding classes of therapeutics, but can be associated with a range of toxic properties. In small-molecule drug development, early identification of potential toxicity led to a significant reduction in clinical trial failures, however we currently lack robust qualitative rules or predictive tools for peptide- and protein-based biologics. To address this, we have manually curated the largest set of high-quality experimental data on peptide and protein toxicities, and developed CSM-Toxin, a novel in-silico protein toxicity classifier, which relies solely on the protein primary sequence. Our approach encodes the protein sequence information using a deep learning natural languages model to understand “biological” language, where residues are treated as words and protein sequences as sentences. The CSM-Toxin was able to accurately identify peptides and proteins with potential toxicity, achieving an MCC of up to 0.66 across both cross-validation and multiple non-redundant blind tests, outperforming other methods and highlighting the robust and generalisable performance of our model. We strongly believe the CSM-Toxin will serve as a valuable platform to minimise potential toxicity in the biologic development pipeline. Our method is freely available as an easy-to-use webserver.https://www.mdpi.com/1999-4923/15/2/431protein toxicitysequencedeep-learning
spellingShingle Vladimir Morozov
Carlos H. M. Rodrigues
David B. Ascher
CSM-Toxin: A Web-Server for Predicting Protein Toxicity
Pharmaceutics
protein toxicity
sequence
deep-learning
title CSM-Toxin: A Web-Server for Predicting Protein Toxicity
title_full CSM-Toxin: A Web-Server for Predicting Protein Toxicity
title_fullStr CSM-Toxin: A Web-Server for Predicting Protein Toxicity
title_full_unstemmed CSM-Toxin: A Web-Server for Predicting Protein Toxicity
title_short CSM-Toxin: A Web-Server for Predicting Protein Toxicity
title_sort csm toxin a web server for predicting protein toxicity
topic protein toxicity
sequence
deep-learning
url https://www.mdpi.com/1999-4923/15/2/431
work_keys_str_mv AT vladimirmorozov csmtoxinawebserverforpredictingproteintoxicity
AT carloshmrodrigues csmtoxinawebserverforpredictingproteintoxicity
AT davidbascher csmtoxinawebserverforpredictingproteintoxicity