Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data

Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices...

Full description

Bibliographic Details
Main Authors: Angelica M. Walker, Ashley Cliff, Jonathon Romero, Manesh B. Shah, Piet Jones, Joao Gabriel Felipe Machado Gazolla, Daniel A Jacobson, David Kainer
Format: Article
Language:English
Published: Elsevier 2022-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037022002513
_version_ 1797978225428660224
author Angelica M. Walker
Ashley Cliff
Jonathon Romero
Manesh B. Shah
Piet Jones
Joao Gabriel Felipe Machado Gazolla
Daniel A Jacobson
David Kainer
author_facet Angelica M. Walker
Ashley Cliff
Jonathon Romero
Manesh B. Shah
Piet Jones
Joao Gabriel Felipe Machado Gazolla
Daniel A Jacobson
David Kainer
author_sort Angelica M. Walker
collection DOAJ
description Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices of gene expression data. Random Forest-Leave One Out Prediction (RF-LOOP) is a method that has been shown to be efficient at producing these gene-to-gene networks, frequently known as GEne Network Inference with Ensemble of trees (GENIE3). Random Forest can be replaced in this process by iterative Random Forest (iRF), which performs variable selection and boosting. Here we validate that iterative Random Forest-Leave One Out Prediction (iRF-LOOP) produces higher quality networks than GENIE3 (RF-LOOP). We use both synthetic and empirical networks from the Dialogue for Reverse Engineering Assessment and Methods (DREAM) Challenges by Sage Bionetworks, as well as two additional empirical networks created from Arabidopsis thaliana and Populus trichocarpa expression data.
first_indexed 2024-04-11T05:19:36Z
format Article
id doaj.art-7fd29270bbdc4a87a52f15b41f73d2d1
institution Directory Open Access Journal
issn 2001-0370
language English
last_indexed 2024-04-11T05:19:36Z
publishDate 2022-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj.art-7fd29270bbdc4a87a52f15b41f73d2d12022-12-24T04:53:05ZengElsevierComputational and Structural Biotechnology Journal2001-03702022-01-012033723386Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression dataAngelica M. Walker0Ashley Cliff1Jonathon Romero2Manesh B. Shah3Piet Jones4Joao Gabriel Felipe Machado Gazolla5Daniel A Jacobson6David Kainer7The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, 821 Volunteer Blvd, Knoxville 37996, TN, USAThe Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, 821 Volunteer Blvd, Knoxville 37996, TN, USAThe Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, 821 Volunteer Blvd, Knoxville 37996, TN, USAComputational and Predictive Biology, Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge 37830, TN, USAThe Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, 821 Volunteer Blvd, Knoxville 37996, TN, USAComputational and Predictive Biology, Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge 37830, TN, USAComputational and Predictive Biology, Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge 37830, TN, USA; Corresponding authors.Computational and Predictive Biology, Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge 37830, TN, USA; Corresponding authors.Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices of gene expression data. Random Forest-Leave One Out Prediction (RF-LOOP) is a method that has been shown to be efficient at producing these gene-to-gene networks, frequently known as GEne Network Inference with Ensemble of trees (GENIE3). Random Forest can be replaced in this process by iterative Random Forest (iRF), which performs variable selection and boosting. Here we validate that iterative Random Forest-Leave One Out Prediction (iRF-LOOP) produces higher quality networks than GENIE3 (RF-LOOP). We use both synthetic and empirical networks from the Dialogue for Reverse Engineering Assessment and Methods (DREAM) Challenges by Sage Bionetworks, as well as two additional empirical networks created from Arabidopsis thaliana and Populus trichocarpa expression data.http://www.sciencedirect.com/science/article/pii/S2001037022002513Random forestIterative random forestGene expression networksNetwork biology
spellingShingle Angelica M. Walker
Ashley Cliff
Jonathon Romero
Manesh B. Shah
Piet Jones
Joao Gabriel Felipe Machado Gazolla
Daniel A Jacobson
David Kainer
Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
Computational and Structural Biotechnology Journal
Random forest
Iterative random forest
Gene expression networks
Network biology
title Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
title_full Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
title_fullStr Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
title_full_unstemmed Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
title_short Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
title_sort evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
topic Random forest
Iterative random forest
Gene expression networks
Network biology
url http://www.sciencedirect.com/science/article/pii/S2001037022002513
work_keys_str_mv AT angelicamwalker evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata
AT ashleycliff evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata
AT jonathonromero evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata
AT maneshbshah evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata
AT pietjones evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata
AT joaogabrielfelipemachadogazolla evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata
AT danielajacobson evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata
AT davidkainer evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata