Identifying biological pathway interrupting toxins using multi-tree ensembles
The pharmaceutical industry constantly seeks new ways to improve current methods that scientists use to evaluate environmental chemicals and develop new medicines. Various automated steps are involved in the process as testing hundreds of thousands of chemicals manually would be infeasible. Our rese...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2016-08-01
|
Series: | Frontiers in Environmental Science |
Subjects: | |
Online Access: | http://journal.frontiersin.org/Journal/10.3389/fenvs.2016.00052/full |
_version_ | 1811280394078650368 |
---|---|
author | Gergo Barta |
author_facet | Gergo Barta |
author_sort | Gergo Barta |
collection | DOAJ |
description | The pharmaceutical industry constantly seeks new ways to improve current methods that scientists use to evaluate environmental chemicals and develop new medicines. Various automated steps are involved in the process as testing hundreds of thousands of chemicals manually would be infeasible. Our research effort and the Toxicology in the 21st Century Data Challenge focused on cost-effective automation of toxicological testing, a chemical substance screening process looking for possible toxic effects caused by interrupting biological pathways. The computational models we propose in this paper successfully combine various publicly available substance fingerprinting tools with advanced machine learning techniques. In our paper, we explore the significance and utility of assorted feature selection methods as the structural analyzers generate a plethora of features for each substance. Machine learning models were carefully selected and evaluated based on their capability to cope with the high-dimensional high-variety data with multi-tree ensemble methods coming out on top. Techniques like Random forests and Extra trees combine numerous simple tree models and proved to produce reliable predictions on toxic activity while being nearly non-parametric and insensitive to dimensionality extremes. The Tox21 Data Challenge contest offered a great platform to compare a wide range of solutions in a controlled and orderly manner. The results clearly demonstrate that the generic approach presented in this paper is comparable to advanced deep learning and domain-specific solutions. Even surpassing the competition in some nuclear receptor signaling and stress pathway assays and achieving an accuracy of up to 94 percent. |
first_indexed | 2024-04-13T01:14:01Z |
format | Article |
id | doaj.art-113277d374614dfbb9ec147cd7bcd20c |
institution | Directory Open Access Journal |
issn | 2296-665X |
language | English |
last_indexed | 2024-04-13T01:14:01Z |
publishDate | 2016-08-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Environmental Science |
spelling | doaj.art-113277d374614dfbb9ec147cd7bcd20c2022-12-22T03:08:59ZengFrontiers Media S.A.Frontiers in Environmental Science2296-665X2016-08-01410.3389/fenvs.2016.00052198181Identifying biological pathway interrupting toxins using multi-tree ensemblesGergo Barta0Budapest University of Technology and EconomicsThe pharmaceutical industry constantly seeks new ways to improve current methods that scientists use to evaluate environmental chemicals and develop new medicines. Various automated steps are involved in the process as testing hundreds of thousands of chemicals manually would be infeasible. Our research effort and the Toxicology in the 21st Century Data Challenge focused on cost-effective automation of toxicological testing, a chemical substance screening process looking for possible toxic effects caused by interrupting biological pathways. The computational models we propose in this paper successfully combine various publicly available substance fingerprinting tools with advanced machine learning techniques. In our paper, we explore the significance and utility of assorted feature selection methods as the structural analyzers generate a plethora of features for each substance. Machine learning models were carefully selected and evaluated based on their capability to cope with the high-dimensional high-variety data with multi-tree ensemble methods coming out on top. Techniques like Random forests and Extra trees combine numerous simple tree models and proved to produce reliable predictions on toxic activity while being nearly non-parametric and insensitive to dimensionality extremes. The Tox21 Data Challenge contest offered a great platform to compare a wide range of solutions in a controlled and orderly manner. The results clearly demonstrate that the generic approach presented in this paper is comparable to advanced deep learning and domain-specific solutions. Even surpassing the competition in some nuclear receptor signaling and stress pathway assays and achieving an accuracy of up to 94 percent.http://journal.frontiersin.org/Journal/10.3389/fenvs.2016.00052/fullClassificationToxicitycompetitionchallengeRandom forestsTox21 |
spellingShingle | Gergo Barta Identifying biological pathway interrupting toxins using multi-tree ensembles Frontiers in Environmental Science Classification Toxicity competition challenge Random forests Tox21 |
title | Identifying biological pathway interrupting toxins using multi-tree ensembles |
title_full | Identifying biological pathway interrupting toxins using multi-tree ensembles |
title_fullStr | Identifying biological pathway interrupting toxins using multi-tree ensembles |
title_full_unstemmed | Identifying biological pathway interrupting toxins using multi-tree ensembles |
title_short | Identifying biological pathway interrupting toxins using multi-tree ensembles |
title_sort | identifying biological pathway interrupting toxins using multi tree ensembles |
topic | Classification Toxicity competition challenge Random forests Tox21 |
url | http://journal.frontiersin.org/Journal/10.3389/fenvs.2016.00052/full |
work_keys_str_mv | AT gergobarta identifyingbiologicalpathwayinterruptingtoxinsusingmultitreeensembles |