Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection

Complex mixtures containing natural products are still an interesting source of novel drug candidates. High content screening (HCS) is a popular tool to screen for such. In particular, multiplexed HCS assays promise comprehensive bioactivity profiles, but generate also high amounts of data. Yet, onl...

Full description

Bibliographic Details
Main Authors: Erwin Kupczyk, Kenji Schorpp, Kamyar Hadian, Sean Lin, Dimitrios Tziotis, Philippe Schmitt-Kopplin, Constanze Mueller
Format: Article
Language:English
Published: Elsevier 2022-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037022004299
_version_ 1797978146426847232
author Erwin Kupczyk
Kenji Schorpp
Kamyar Hadian
Sean Lin
Dimitrios Tziotis
Philippe Schmitt-Kopplin
Constanze Mueller
author_facet Erwin Kupczyk
Kenji Schorpp
Kamyar Hadian
Sean Lin
Dimitrios Tziotis
Philippe Schmitt-Kopplin
Constanze Mueller
author_sort Erwin Kupczyk
collection DOAJ
description Complex mixtures containing natural products are still an interesting source of novel drug candidates. High content screening (HCS) is a popular tool to screen for such. In particular, multiplexed HCS assays promise comprehensive bioactivity profiles, but generate also high amounts of data. Yet, only some machine learning (ML) applications for data analysis are available and these usually require a profound knowledge of the underlying cell biology. Unfortunately, there are no applications that simply predict if samples are biologically active or not (any kind of bioactivity). Within this work, we benchmark ML algorithms for binary classification, starting with classical ML models, which are the standard classifiers of the scikit-learn library or ensemble models of these classifiers (a total of 92 models tested). Followed by a partial least square regression (PLSR)-based classification (44 tested models in total) and simple artificial neural networks (ANNs) with dense layers (72 tested models in total). In addition, a novelty detection (ND) was examined, which is supposed to handle unknown patterns. For the final analysis the models, with and without upstream ND, were tested with two independent data sets. In our analysis, a stacking model, an ensamble model of class ML algorithms, performed best to predict new and unknown data. ND improved the predictions of the models and was useful to handle unknown patterns. Importantly, the classifier presented here can be easily rebuilt and be adapted to the data and demands of other groups. The hit detector (ND + stacking model) is universal and suitable for a broader application to support the search for new drug candidates.
first_indexed 2024-04-11T05:18:17Z
format Article
id doaj.art-4aa5cd7a00e84195a70b11e89e400f7e
institution Directory Open Access Journal
issn 2001-0370
language English
last_indexed 2024-04-11T05:18:17Z
publishDate 2022-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj.art-4aa5cd7a00e84195a70b11e89e400f7e2022-12-24T04:54:31ZengElsevierComputational and Structural Biotechnology Journal2001-03702022-01-012054535465Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detectionErwin Kupczyk0Kenji Schorpp1Kamyar Hadian2Sean Lin3Dimitrios Tziotis4Philippe Schmitt-Kopplin5Constanze Mueller6Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany; Comprehensive Foodomics Platform, Chair of Analytical Food Chemistry, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 2, 85354 Freising, GermanyInstitute for Molecular Toxicology and Pharmacology, Cell Signaling and Chemical Biology, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, GermanyInstitute for Molecular Toxicology and Pharmacology, Cell Signaling and Chemical Biology, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, GermanyInstitute for Molecular Toxicology and Pharmacology, Cell Signaling and Chemical Biology, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, GermanyResearch Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, GermanyResearch Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany; Comprehensive Foodomics Platform, Chair of Analytical Food Chemistry, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 2, 85354 Freising, Germany; Corresponding author at: Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany.Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany; Corresponding author at: Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany.Complex mixtures containing natural products are still an interesting source of novel drug candidates. High content screening (HCS) is a popular tool to screen for such. In particular, multiplexed HCS assays promise comprehensive bioactivity profiles, but generate also high amounts of data. Yet, only some machine learning (ML) applications for data analysis are available and these usually require a profound knowledge of the underlying cell biology. Unfortunately, there are no applications that simply predict if samples are biologically active or not (any kind of bioactivity). Within this work, we benchmark ML algorithms for binary classification, starting with classical ML models, which are the standard classifiers of the scikit-learn library or ensemble models of these classifiers (a total of 92 models tested). Followed by a partial least square regression (PLSR)-based classification (44 tested models in total) and simple artificial neural networks (ANNs) with dense layers (72 tested models in total). In addition, a novelty detection (ND) was examined, which is supposed to handle unknown patterns. For the final analysis the models, with and without upstream ND, were tested with two independent data sets. In our analysis, a stacking model, an ensamble model of class ML algorithms, performed best to predict new and unknown data. ND improved the predictions of the models and was useful to handle unknown patterns. Importantly, the classifier presented here can be easily rebuilt and be adapted to the data and demands of other groups. The hit detector (ND + stacking model) is universal and suitable for a broader application to support the search for new drug candidates.http://www.sciencedirect.com/science/article/pii/S2001037022004299High-content screeningMachine learningDeep learningClassifierNovelty detectionBioactives
spellingShingle Erwin Kupczyk
Kenji Schorpp
Kamyar Hadian
Sean Lin
Dimitrios Tziotis
Philippe Schmitt-Kopplin
Constanze Mueller
Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
Computational and Structural Biotechnology Journal
High-content screening
Machine learning
Deep learning
Classifier
Novelty detection
Bioactives
title Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_full Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_fullStr Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_full_unstemmed Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_short Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_sort unleashing high content screening in hit detection benchmarking ai workflows including novelty detection
topic High-content screening
Machine learning
Deep learning
Classifier
Novelty detection
Bioactives
url http://www.sciencedirect.com/science/article/pii/S2001037022004299
work_keys_str_mv AT erwinkupczyk unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection
AT kenjischorpp unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection
AT kamyarhadian unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection
AT seanlin unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection
AT dimitriostziotis unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection
AT philippeschmittkopplin unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection
AT constanzemueller unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection