Identifying biological pathway interrupting toxins using multi-tree ensembles

The pharmaceutical industry constantly seeks new ways to improve current methods that scientists use to evaluate environmental chemicals and develop new medicines. Various automated steps are involved in the process as testing hundreds of thousands of chemicals manually would be infeasible. Our rese...

Full description

Bibliographic Details
Main Author: Gergo Barta
Format: Article
Language:English
Published: Frontiers Media S.A. 2016-08-01
Series:Frontiers in Environmental Science
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fenvs.2016.00052/full
_version_ 1811280394078650368
author Gergo Barta
author_facet Gergo Barta
author_sort Gergo Barta
collection DOAJ
description The pharmaceutical industry constantly seeks new ways to improve current methods that scientists use to evaluate environmental chemicals and develop new medicines. Various automated steps are involved in the process as testing hundreds of thousands of chemicals manually would be infeasible. Our research effort and the Toxicology in the 21st Century Data Challenge focused on cost-effective automation of toxicological testing, a chemical substance screening process looking for possible toxic effects caused by interrupting biological pathways. The computational models we propose in this paper successfully combine various publicly available substance fingerprinting tools with advanced machine learning techniques. In our paper, we explore the significance and utility of assorted feature selection methods as the structural analyzers generate a plethora of features for each substance. Machine learning models were carefully selected and evaluated based on their capability to cope with the high-dimensional high-variety data with multi-tree ensemble methods coming out on top. Techniques like Random forests and Extra trees combine numerous simple tree models and proved to produce reliable predictions on toxic activity while being nearly non-parametric and insensitive to dimensionality extremes. The Tox21 Data Challenge contest offered a great platform to compare a wide range of solutions in a controlled and orderly manner. The results clearly demonstrate that the generic approach presented in this paper is comparable to advanced deep learning and domain-specific solutions. Even surpassing the competition in some nuclear receptor signaling and stress pathway assays and achieving an accuracy of up to 94 percent.
first_indexed 2024-04-13T01:14:01Z
format Article
id doaj.art-113277d374614dfbb9ec147cd7bcd20c
institution Directory Open Access Journal
issn 2296-665X
language English
last_indexed 2024-04-13T01:14:01Z
publishDate 2016-08-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Environmental Science
spelling doaj.art-113277d374614dfbb9ec147cd7bcd20c2022-12-22T03:08:59ZengFrontiers Media S.A.Frontiers in Environmental Science2296-665X2016-08-01410.3389/fenvs.2016.00052198181Identifying biological pathway interrupting toxins using multi-tree ensemblesGergo Barta0Budapest University of Technology and EconomicsThe pharmaceutical industry constantly seeks new ways to improve current methods that scientists use to evaluate environmental chemicals and develop new medicines. Various automated steps are involved in the process as testing hundreds of thousands of chemicals manually would be infeasible. Our research effort and the Toxicology in the 21st Century Data Challenge focused on cost-effective automation of toxicological testing, a chemical substance screening process looking for possible toxic effects caused by interrupting biological pathways. The computational models we propose in this paper successfully combine various publicly available substance fingerprinting tools with advanced machine learning techniques. In our paper, we explore the significance and utility of assorted feature selection methods as the structural analyzers generate a plethora of features for each substance. Machine learning models were carefully selected and evaluated based on their capability to cope with the high-dimensional high-variety data with multi-tree ensemble methods coming out on top. Techniques like Random forests and Extra trees combine numerous simple tree models and proved to produce reliable predictions on toxic activity while being nearly non-parametric and insensitive to dimensionality extremes. The Tox21 Data Challenge contest offered a great platform to compare a wide range of solutions in a controlled and orderly manner. The results clearly demonstrate that the generic approach presented in this paper is comparable to advanced deep learning and domain-specific solutions. Even surpassing the competition in some nuclear receptor signaling and stress pathway assays and achieving an accuracy of up to 94 percent.http://journal.frontiersin.org/Journal/10.3389/fenvs.2016.00052/fullClassificationToxicitycompetitionchallengeRandom forestsTox21
spellingShingle Gergo Barta
Identifying biological pathway interrupting toxins using multi-tree ensembles
Frontiers in Environmental Science
Classification
Toxicity
competition
challenge
Random forests
Tox21
title Identifying biological pathway interrupting toxins using multi-tree ensembles
title_full Identifying biological pathway interrupting toxins using multi-tree ensembles
title_fullStr Identifying biological pathway interrupting toxins using multi-tree ensembles
title_full_unstemmed Identifying biological pathway interrupting toxins using multi-tree ensembles
title_short Identifying biological pathway interrupting toxins using multi-tree ensembles
title_sort identifying biological pathway interrupting toxins using multi tree ensembles
topic Classification
Toxicity
competition
challenge
Random forests
Tox21
url http://journal.frontiersin.org/Journal/10.3389/fenvs.2016.00052/full
work_keys_str_mv AT gergobarta identifyingbiologicalpathwayinterruptingtoxinsusingmultitreeensembles