HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines

The cysteine side chain has a free thiol group, making it the amino acid residue most often covalently modified by small molecules possessing weakly electrophilic warheads, thereby prolonging on-target residence time and reducing the risk of idiosyncratic drug toxicity. However, not all cysteines ar...

Full description

Bibliographic Details
Main Authors: Mingjie Gao, Stefan Günther
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:https://www.mdpi.com/1422-0067/24/6/5960
_version_ 1797611147613962240
author Mingjie Gao
Stefan Günther
author_facet Mingjie Gao
Stefan Günther
author_sort Mingjie Gao
collection DOAJ
description The cysteine side chain has a free thiol group, making it the amino acid residue most often covalently modified by small molecules possessing weakly electrophilic warheads, thereby prolonging on-target residence time and reducing the risk of idiosyncratic drug toxicity. However, not all cysteines are equally reactive or accessible. Hence, to identify targetable cysteines, we propose a novel ensemble stacked machine learning (ML) model to predict hyper-reactive druggable cysteines, named HyperCys. First, the pocket, conservation, structural and energy profiles, and physicochemical properties of (non)covalently bound cysteines were collected from both protein sequences and 3D structures of protein–ligand complexes. Then, we established the HyperCys ensemble stacked model by integrating six different ML models, including K-nearest neighbors, support vector machine, light gradient boost machine, multi-layer perceptron classifier, random forest, and the meta-classifier model logistic regression. Finally, based on the hyper-reactive cysteines’ classification accuracy and other metrics, the results for different feature group combinations were compared. The results show that the accuracy, F1 score, recall score, and ROC AUC values of HyperCys are 0.784, 0.754, 0.742, and 0.824, respectively, after performing 10-fold CV with the best window size. Compared to traditional ML models with only sequenced-based features or only 3D structural features, HyperCys is more accurate at predicting hyper-reactive druggable cysteines. It is anticipated that HyperCys will be an effective tool for discovering new potential reactive cysteines in a wide range of nucleophilic proteins and will provide an important contribution to the design of targeted covalent inhibitors with high potency and selectivity.
first_indexed 2024-03-11T06:23:41Z
format Article
id doaj.art-09d799a650cd47b1a97c9905d66c4cc7
institution Directory Open Access Journal
issn 1661-6596
1422-0067
language English
last_indexed 2024-03-11T06:23:41Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series International Journal of Molecular Sciences
spelling doaj.art-09d799a650cd47b1a97c9905d66c4cc72023-11-17T11:41:35ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672023-03-01246596010.3390/ijms24065960HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable CysteinesMingjie Gao0Stefan Günther1Institute of Pharmaceutical Sciences, Albert-Ludwigs-Universität Freiburg, Hermann-Herder-Straße 9, 79104 Freiburg, GermanyInstitute of Pharmaceutical Sciences, Albert-Ludwigs-Universität Freiburg, Hermann-Herder-Straße 9, 79104 Freiburg, GermanyThe cysteine side chain has a free thiol group, making it the amino acid residue most often covalently modified by small molecules possessing weakly electrophilic warheads, thereby prolonging on-target residence time and reducing the risk of idiosyncratic drug toxicity. However, not all cysteines are equally reactive or accessible. Hence, to identify targetable cysteines, we propose a novel ensemble stacked machine learning (ML) model to predict hyper-reactive druggable cysteines, named HyperCys. First, the pocket, conservation, structural and energy profiles, and physicochemical properties of (non)covalently bound cysteines were collected from both protein sequences and 3D structures of protein–ligand complexes. Then, we established the HyperCys ensemble stacked model by integrating six different ML models, including K-nearest neighbors, support vector machine, light gradient boost machine, multi-layer perceptron classifier, random forest, and the meta-classifier model logistic regression. Finally, based on the hyper-reactive cysteines’ classification accuracy and other metrics, the results for different feature group combinations were compared. The results show that the accuracy, F1 score, recall score, and ROC AUC values of HyperCys are 0.784, 0.754, 0.742, and 0.824, respectively, after performing 10-fold CV with the best window size. Compared to traditional ML models with only sequenced-based features or only 3D structural features, HyperCys is more accurate at predicting hyper-reactive druggable cysteines. It is anticipated that HyperCys will be an effective tool for discovering new potential reactive cysteines in a wide range of nucleophilic proteins and will provide an important contribution to the design of targeted covalent inhibitors with high potency and selectivity.https://www.mdpi.com/1422-0067/24/6/5960machine learningstructure and sequence baseddruggable cysteinereactivity prediction
spellingShingle Mingjie Gao
Stefan Günther
HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines
International Journal of Molecular Sciences
machine learning
structure and sequence based
druggable cysteine
reactivity prediction
title HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines
title_full HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines
title_fullStr HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines
title_full_unstemmed HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines
title_short HyperCys: A Structure- and Sequence-Based Predictor of Hyper-Reactive Druggable Cysteines
title_sort hypercys a structure and sequence based predictor of hyper reactive druggable cysteines
topic machine learning
structure and sequence based
druggable cysteine
reactivity prediction
url https://www.mdpi.com/1422-0067/24/6/5960
work_keys_str_mv AT mingjiegao hypercysastructureandsequencebasedpredictorofhyperreactivedruggablecysteines
AT stefangunther hypercysastructureandsequencebasedpredictorofhyperreactivedruggablecysteines