Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2022-01-01
|
Series: | Computational and Structural Biotechnology Journal |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2001037022002513 |
_version_ | 1797978225428660224 |
---|---|
author | Angelica M. Walker Ashley Cliff Jonathon Romero Manesh B. Shah Piet Jones Joao Gabriel Felipe Machado Gazolla Daniel A Jacobson David Kainer |
author_facet | Angelica M. Walker Ashley Cliff Jonathon Romero Manesh B. Shah Piet Jones Joao Gabriel Felipe Machado Gazolla Daniel A Jacobson David Kainer |
author_sort | Angelica M. Walker |
collection | DOAJ |
description | Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices of gene expression data. Random Forest-Leave One Out Prediction (RF-LOOP) is a method that has been shown to be efficient at producing these gene-to-gene networks, frequently known as GEne Network Inference with Ensemble of trees (GENIE3). Random Forest can be replaced in this process by iterative Random Forest (iRF), which performs variable selection and boosting. Here we validate that iterative Random Forest-Leave One Out Prediction (iRF-LOOP) produces higher quality networks than GENIE3 (RF-LOOP). We use both synthetic and empirical networks from the Dialogue for Reverse Engineering Assessment and Methods (DREAM) Challenges by Sage Bionetworks, as well as two additional empirical networks created from Arabidopsis thaliana and Populus trichocarpa expression data. |
first_indexed | 2024-04-11T05:19:36Z |
format | Article |
id | doaj.art-7fd29270bbdc4a87a52f15b41f73d2d1 |
institution | Directory Open Access Journal |
issn | 2001-0370 |
language | English |
last_indexed | 2024-04-11T05:19:36Z |
publishDate | 2022-01-01 |
publisher | Elsevier |
record_format | Article |
series | Computational and Structural Biotechnology Journal |
spelling | doaj.art-7fd29270bbdc4a87a52f15b41f73d2d12022-12-24T04:53:05ZengElsevierComputational and Structural Biotechnology Journal2001-03702022-01-012033723386Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression dataAngelica M. Walker0Ashley Cliff1Jonathon Romero2Manesh B. Shah3Piet Jones4Joao Gabriel Felipe Machado Gazolla5Daniel A Jacobson6David Kainer7The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, 821 Volunteer Blvd, Knoxville 37996, TN, USAThe Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, 821 Volunteer Blvd, Knoxville 37996, TN, USAThe Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, 821 Volunteer Blvd, Knoxville 37996, TN, USAComputational and Predictive Biology, Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge 37830, TN, USAThe Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, 821 Volunteer Blvd, Knoxville 37996, TN, USAComputational and Predictive Biology, Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge 37830, TN, USAComputational and Predictive Biology, Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge 37830, TN, USA; Corresponding authors.Computational and Predictive Biology, Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge 37830, TN, USA; Corresponding authors.Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices of gene expression data. Random Forest-Leave One Out Prediction (RF-LOOP) is a method that has been shown to be efficient at producing these gene-to-gene networks, frequently known as GEne Network Inference with Ensemble of trees (GENIE3). Random Forest can be replaced in this process by iterative Random Forest (iRF), which performs variable selection and boosting. Here we validate that iterative Random Forest-Leave One Out Prediction (iRF-LOOP) produces higher quality networks than GENIE3 (RF-LOOP). We use both synthetic and empirical networks from the Dialogue for Reverse Engineering Assessment and Methods (DREAM) Challenges by Sage Bionetworks, as well as two additional empirical networks created from Arabidopsis thaliana and Populus trichocarpa expression data.http://www.sciencedirect.com/science/article/pii/S2001037022002513Random forestIterative random forestGene expression networksNetwork biology |
spellingShingle | Angelica M. Walker Ashley Cliff Jonathon Romero Manesh B. Shah Piet Jones Joao Gabriel Felipe Machado Gazolla Daniel A Jacobson David Kainer Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data Computational and Structural Biotechnology Journal Random forest Iterative random forest Gene expression networks Network biology |
title | Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data |
title_full | Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data |
title_fullStr | Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data |
title_full_unstemmed | Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data |
title_short | Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data |
title_sort | evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data |
topic | Random forest Iterative random forest Gene expression networks Network biology |
url | http://www.sciencedirect.com/science/article/pii/S2001037022002513 |
work_keys_str_mv | AT angelicamwalker evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata AT ashleycliff evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata AT jonathonromero evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata AT maneshbshah evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata AT pietjones evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata AT joaogabrielfelipemachadogazolla evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata AT danielajacobson evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata AT davidkainer evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata |