Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
Complex mixtures containing natural products are still an interesting source of novel drug candidates. High content screening (HCS) is a popular tool to screen for such. In particular, multiplexed HCS assays promise comprehensive bioactivity profiles, but generate also high amounts of data. Yet, onl...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2022-01-01
|
Series: | Computational and Structural Biotechnology Journal |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2001037022004299 |
_version_ | 1797978146426847232 |
---|---|
author | Erwin Kupczyk Kenji Schorpp Kamyar Hadian Sean Lin Dimitrios Tziotis Philippe Schmitt-Kopplin Constanze Mueller |
author_facet | Erwin Kupczyk Kenji Schorpp Kamyar Hadian Sean Lin Dimitrios Tziotis Philippe Schmitt-Kopplin Constanze Mueller |
author_sort | Erwin Kupczyk |
collection | DOAJ |
description | Complex mixtures containing natural products are still an interesting source of novel drug candidates. High content screening (HCS) is a popular tool to screen for such. In particular, multiplexed HCS assays promise comprehensive bioactivity profiles, but generate also high amounts of data. Yet, only some machine learning (ML) applications for data analysis are available and these usually require a profound knowledge of the underlying cell biology. Unfortunately, there are no applications that simply predict if samples are biologically active or not (any kind of bioactivity). Within this work, we benchmark ML algorithms for binary classification, starting with classical ML models, which are the standard classifiers of the scikit-learn library or ensemble models of these classifiers (a total of 92 models tested). Followed by a partial least square regression (PLSR)-based classification (44 tested models in total) and simple artificial neural networks (ANNs) with dense layers (72 tested models in total). In addition, a novelty detection (ND) was examined, which is supposed to handle unknown patterns. For the final analysis the models, with and without upstream ND, were tested with two independent data sets. In our analysis, a stacking model, an ensamble model of class ML algorithms, performed best to predict new and unknown data. ND improved the predictions of the models and was useful to handle unknown patterns. Importantly, the classifier presented here can be easily rebuilt and be adapted to the data and demands of other groups. The hit detector (ND + stacking model) is universal and suitable for a broader application to support the search for new drug candidates. |
first_indexed | 2024-04-11T05:18:17Z |
format | Article |
id | doaj.art-4aa5cd7a00e84195a70b11e89e400f7e |
institution | Directory Open Access Journal |
issn | 2001-0370 |
language | English |
last_indexed | 2024-04-11T05:18:17Z |
publishDate | 2022-01-01 |
publisher | Elsevier |
record_format | Article |
series | Computational and Structural Biotechnology Journal |
spelling | doaj.art-4aa5cd7a00e84195a70b11e89e400f7e2022-12-24T04:54:31ZengElsevierComputational and Structural Biotechnology Journal2001-03702022-01-012054535465Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detectionErwin Kupczyk0Kenji Schorpp1Kamyar Hadian2Sean Lin3Dimitrios Tziotis4Philippe Schmitt-Kopplin5Constanze Mueller6Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany; Comprehensive Foodomics Platform, Chair of Analytical Food Chemistry, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 2, 85354 Freising, GermanyInstitute for Molecular Toxicology and Pharmacology, Cell Signaling and Chemical Biology, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, GermanyInstitute for Molecular Toxicology and Pharmacology, Cell Signaling and Chemical Biology, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, GermanyInstitute for Molecular Toxicology and Pharmacology, Cell Signaling and Chemical Biology, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, GermanyResearch Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, GermanyResearch Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany; Comprehensive Foodomics Platform, Chair of Analytical Food Chemistry, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 2, 85354 Freising, Germany; Corresponding author at: Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany.Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany; Corresponding author at: Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany.Complex mixtures containing natural products are still an interesting source of novel drug candidates. High content screening (HCS) is a popular tool to screen for such. In particular, multiplexed HCS assays promise comprehensive bioactivity profiles, but generate also high amounts of data. Yet, only some machine learning (ML) applications for data analysis are available and these usually require a profound knowledge of the underlying cell biology. Unfortunately, there are no applications that simply predict if samples are biologically active or not (any kind of bioactivity). Within this work, we benchmark ML algorithms for binary classification, starting with classical ML models, which are the standard classifiers of the scikit-learn library or ensemble models of these classifiers (a total of 92 models tested). Followed by a partial least square regression (PLSR)-based classification (44 tested models in total) and simple artificial neural networks (ANNs) with dense layers (72 tested models in total). In addition, a novelty detection (ND) was examined, which is supposed to handle unknown patterns. For the final analysis the models, with and without upstream ND, were tested with two independent data sets. In our analysis, a stacking model, an ensamble model of class ML algorithms, performed best to predict new and unknown data. ND improved the predictions of the models and was useful to handle unknown patterns. Importantly, the classifier presented here can be easily rebuilt and be adapted to the data and demands of other groups. The hit detector (ND + stacking model) is universal and suitable for a broader application to support the search for new drug candidates.http://www.sciencedirect.com/science/article/pii/S2001037022004299High-content screeningMachine learningDeep learningClassifierNovelty detectionBioactives |
spellingShingle | Erwin Kupczyk Kenji Schorpp Kamyar Hadian Sean Lin Dimitrios Tziotis Philippe Schmitt-Kopplin Constanze Mueller Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection Computational and Structural Biotechnology Journal High-content screening Machine learning Deep learning Classifier Novelty detection Bioactives |
title | Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection |
title_full | Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection |
title_fullStr | Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection |
title_full_unstemmed | Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection |
title_short | Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection |
title_sort | unleashing high content screening in hit detection benchmarking ai workflows including novelty detection |
topic | High-content screening Machine learning Deep learning Classifier Novelty detection Bioactives |
url | http://www.sciencedirect.com/science/article/pii/S2001037022004299 |
work_keys_str_mv | AT erwinkupczyk unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT kenjischorpp unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT kamyarhadian unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT seanlin unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT dimitriostziotis unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT philippeschmittkopplin unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT constanzemueller unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection |