Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using <i>Daphnia magna</i> Transcriptomic Profiles

A wide range of environmental factors heavily impact aquatic ecosystems, in turn, affecting human health. Toxic organic compounds resulting from anthropogenic activity are a source of pollution in aquatic ecosystems. To evaluate these contaminants, current approaches mainly rely on acute and chronic...

Full description

Bibliographic Details
Main Authors: Tae-June Choi, Hyung-Eun An, Chang-Bae Kim
Format: Article
Language:English
Published: MDPI AG 2022-09-01
Series:Life
Subjects:
Online Access:https://www.mdpi.com/2075-1729/12/9/1443
_version_ 1797485694344495104
author Tae-June Choi
Hyung-Eun An
Chang-Bae Kim
author_facet Tae-June Choi
Hyung-Eun An
Chang-Bae Kim
author_sort Tae-June Choi
collection DOAJ
description A wide range of environmental factors heavily impact aquatic ecosystems, in turn, affecting human health. Toxic organic compounds resulting from anthropogenic activity are a source of pollution in aquatic ecosystems. To evaluate these contaminants, current approaches mainly rely on acute and chronic toxicity tests, but cannot provide explicit insights into the causes of toxicity. As an alternative, genome-wide gene expression systems allow the identification of contaminants causing toxicity by monitoring the organisms’ response to toxic substances. In this study, we selected 22 toxic organic compounds, classified as pesticides, herbicides, or industrial chemicals, that induce environmental problems in aquatic ecosystems and affect human-health. To identify toxic organic compounds using gene expression data from <i>Daphnia magna</i>, we evaluated the performance of three machine learning based feature-ranking algorithms (Learning Vector Quantization, Random Forest, and Support Vector Machines with a Linear kernel), and nine classifiers (Linear Discriminant Analysis, Classification And Regression Trees, K-nearest neighbors, Support Vector Machines with a Linear kernel, Random Forest, Boosted C5.0, Gradient Boosting Machine, eXtreme Gradient Boosting with tree, and eXtreme Gradient Boosting with DART booster). Our analysis revealed that a combination of feature selection based on feature-ranking and a random forest classification algorithm had the best model performance, with an accuracy of 95.7%. This is a preliminary study to establish a model for the monitoring of aquatic toxic substances by machine learning. This model could be an effective tool to manage contaminants and toxic organic compounds in aquatic systems.
first_indexed 2024-03-09T23:22:33Z
format Article
id doaj.art-7ebff107f26d418ab9d763239cf3c52b
institution Directory Open Access Journal
issn 2075-1729
language English
last_indexed 2024-03-09T23:22:33Z
publishDate 2022-09-01
publisher MDPI AG
record_format Article
series Life
spelling doaj.art-7ebff107f26d418ab9d763239cf3c52b2023-11-23T17:24:14ZengMDPI AGLife2075-17292022-09-01129144310.3390/life12091443Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using <i>Daphnia magna</i> Transcriptomic ProfilesTae-June Choi0Hyung-Eun An1Chang-Bae Kim2Department of Biotechnology, Sangmyung University, Seoul 03016, KoreaDepartment of Biotechnology, Sangmyung University, Seoul 03016, KoreaDepartment of Biotechnology, Sangmyung University, Seoul 03016, KoreaA wide range of environmental factors heavily impact aquatic ecosystems, in turn, affecting human health. Toxic organic compounds resulting from anthropogenic activity are a source of pollution in aquatic ecosystems. To evaluate these contaminants, current approaches mainly rely on acute and chronic toxicity tests, but cannot provide explicit insights into the causes of toxicity. As an alternative, genome-wide gene expression systems allow the identification of contaminants causing toxicity by monitoring the organisms’ response to toxic substances. In this study, we selected 22 toxic organic compounds, classified as pesticides, herbicides, or industrial chemicals, that induce environmental problems in aquatic ecosystems and affect human-health. To identify toxic organic compounds using gene expression data from <i>Daphnia magna</i>, we evaluated the performance of three machine learning based feature-ranking algorithms (Learning Vector Quantization, Random Forest, and Support Vector Machines with a Linear kernel), and nine classifiers (Linear Discriminant Analysis, Classification And Regression Trees, K-nearest neighbors, Support Vector Machines with a Linear kernel, Random Forest, Boosted C5.0, Gradient Boosting Machine, eXtreme Gradient Boosting with tree, and eXtreme Gradient Boosting with DART booster). Our analysis revealed that a combination of feature selection based on feature-ranking and a random forest classification algorithm had the best model performance, with an accuracy of 95.7%. This is a preliminary study to establish a model for the monitoring of aquatic toxic substances by machine learning. This model could be an effective tool to manage contaminants and toxic organic compounds in aquatic systems.https://www.mdpi.com/2075-1729/12/9/1443environmental monitoringaquatic ecosystemtoxic organic compounds<i>Daphnia magna</i>transcriptomic profilesmachine learning
spellingShingle Tae-June Choi
Hyung-Eun An
Chang-Bae Kim
Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using <i>Daphnia magna</i> Transcriptomic Profiles
Life
environmental monitoring
aquatic ecosystem
toxic organic compounds
<i>Daphnia magna</i>
transcriptomic profiles
machine learning
title Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using <i>Daphnia magna</i> Transcriptomic Profiles
title_full Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using <i>Daphnia magna</i> Transcriptomic Profiles
title_fullStr Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using <i>Daphnia magna</i> Transcriptomic Profiles
title_full_unstemmed Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using <i>Daphnia magna</i> Transcriptomic Profiles
title_short Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using <i>Daphnia magna</i> Transcriptomic Profiles
title_sort machine learning models for identification and prediction of toxic organic compounds using i daphnia magna i transcriptomic profiles
topic environmental monitoring
aquatic ecosystem
toxic organic compounds
<i>Daphnia magna</i>
transcriptomic profiles
machine learning
url https://www.mdpi.com/2075-1729/12/9/1443
work_keys_str_mv AT taejunechoi machinelearningmodelsforidentificationandpredictionoftoxicorganiccompoundsusingidaphniamagnaitranscriptomicprofiles
AT hyungeunan machinelearningmodelsforidentificationandpredictionoftoxicorganiccompoundsusingidaphniamagnaitranscriptomicprofiles
AT changbaekim machinelearningmodelsforidentificationandpredictionoftoxicorganiccompoundsusingidaphniamagnaitranscriptomicprofiles