Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data

<p>Abstract</p> <p>Background</p> <p>There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated...

Full description

Bibliographic Details
Main Authors: Pocock Matthew R, Oinn Tom, Withers David, Owen Stuart, Soiland-Reyes Stian, Wassink Ingo, Velarde Giles, Castrillo Juan I, Li Peter, Goble Carole A, Oliver Stephen G, Kell Douglas B
Format: Article
Language:English
Published: BMC 2008-08-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/334
_version_ 1818157804495568896
author Pocock Matthew R
Oinn Tom
Withers David
Owen Stuart
Soiland-Reyes Stian
Wassink Ingo
Velarde Giles
Castrillo Juan I
Li Peter
Goble Carole A
Oliver Stephen G
Kell Douglas B
author_facet Pocock Matthew R
Oinn Tom
Withers David
Owen Stuart
Soiland-Reyes Stian
Wassink Ingo
Velarde Giles
Castrillo Juan I
Li Peter
Goble Carole A
Oliver Stephen G
Kell Douglas B
author_sort Pocock Matthew R
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools.</p> <p>Results</p> <p>Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and <it>ad hoc </it>analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench.</p> <p>Conclusion</p> <p>Taverna can be used by data analysis experts as a generic tool for composing <it>ad hoc </it>analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.</p>
first_indexed 2024-12-11T15:20:01Z
format Article
id doaj.art-6e34f5835e0a41c7b7d54e834d4fd72b
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-11T15:20:01Z
publishDate 2008-08-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-6e34f5835e0a41c7b7d54e834d4fd72b2022-12-22T01:00:25ZengBMCBMC Bioinformatics1471-21052008-08-019133410.1186/1471-2105-9-334Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray dataPocock Matthew ROinn TomWithers DavidOwen StuartSoiland-Reyes StianWassink IngoVelarde GilesCastrillo Juan ILi PeterGoble Carole AOliver Stephen GKell Douglas B<p>Abstract</p> <p>Background</p> <p>There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools.</p> <p>Results</p> <p>Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and <it>ad hoc </it>analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench.</p> <p>Conclusion</p> <p>Taverna can be used by data analysis experts as a generic tool for composing <it>ad hoc </it>analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.</p>http://www.biomedcentral.com/1471-2105/9/334
spellingShingle Pocock Matthew R
Oinn Tom
Withers David
Owen Stuart
Soiland-Reyes Stian
Wassink Ingo
Velarde Giles
Castrillo Juan I
Li Peter
Goble Carole A
Oliver Stephen G
Kell Douglas B
Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data
BMC Bioinformatics
title Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data
title_full Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data
title_fullStr Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data
title_full_unstemmed Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data
title_short Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data
title_sort performing statistical analyses on quantitative data in taverna workflows an example using r and maxdbrowse to identify differentially expressed genes from microarray data
url http://www.biomedcentral.com/1471-2105/9/334
work_keys_str_mv AT pocockmatthewr performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata
AT oinntom performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata
AT withersdavid performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata
AT owenstuart performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata
AT soilandreyesstian performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata
AT wassinkingo performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata
AT velardegiles performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata
AT castrillojuani performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata
AT lipeter performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata
AT goblecarolea performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata
AT oliverstepheng performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata
AT kelldouglasb performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata