A structured overview of simultaneous component based data integration

<p>Abstract</p> <p>Background</p> <p>Data integration is currently one of the main challenges in the biomedical sciences. Often different pieces of information are gathered on the same set of entities (e.g., tissues, culture samples, biomolecules) with the different pie...

Full description

Bibliographic Details
Main Authors: van der Werf Mariët J, Smilde Age K, Van Deun Katrijn, Kiers Henk AL, Van Mechelen Iven
Format: Article
Language:English
Published: BMC 2009-08-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/10/246
_version_ 1818154308782260224
author van der Werf Mariët J
Smilde Age K
Van Deun Katrijn
Kiers Henk AL
Van Mechelen Iven
author_facet van der Werf Mariët J
Smilde Age K
Van Deun Katrijn
Kiers Henk AL
Van Mechelen Iven
author_sort van der Werf Mariët J
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Data integration is currently one of the main challenges in the biomedical sciences. Often different pieces of information are gathered on the same set of entities (e.g., tissues, culture samples, biomolecules) with the different pieces stemming, for example, from different measurement techniques. This implies that more and more data appear that consist of two or more data arrays that have a shared mode. An integrative analysis of such coupled data should be based on a simultaneous analysis of all data arrays. In this respect, the family of simultaneous component methods (e.g., SUM-PCA, unrestricted PCovR, MFA, STATIS, and SCA-P) is a natural choice. Yet, different simultaneous component methods may lead to quite different results.</p> <p>Results</p> <p>We offer a structured overview of simultaneous component methods that frames them in a principal components setting such that both the common core of the methods and the specific elements with regard to which they differ are highlighted. An overview of principles is given that may guide the data analyst in choosing an appropriate simultaneous component method. Several theoretical and practical issues are illustrated with an empirical example on metabolomics data for <it>Escherichia coli </it>as obtained with different analytical chemical measurement methods.</p> <p>Conclusion</p> <p>Of the aspects in which the simultaneous component methods differ, pre-processing and weighting are consequential. Especially, the type of weighting of the different matrices is essential for simultaneous component analysis. These types are shown to be linked to different specifications of the idea of a fair integration of the different coupled arrays.</p>
first_indexed 2024-12-11T14:24:27Z
format Article
id doaj.art-8e36ca45d71442718e7c3bb2fb5830af
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-11T14:24:27Z
publishDate 2009-08-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-8e36ca45d71442718e7c3bb2fb5830af2022-12-22T01:02:46ZengBMCBMC Bioinformatics1471-21052009-08-0110124610.1186/1471-2105-10-246A structured overview of simultaneous component based data integrationvan der Werf Mariët JSmilde Age KVan Deun KatrijnKiers Henk ALVan Mechelen Iven<p>Abstract</p> <p>Background</p> <p>Data integration is currently one of the main challenges in the biomedical sciences. Often different pieces of information are gathered on the same set of entities (e.g., tissues, culture samples, biomolecules) with the different pieces stemming, for example, from different measurement techniques. This implies that more and more data appear that consist of two or more data arrays that have a shared mode. An integrative analysis of such coupled data should be based on a simultaneous analysis of all data arrays. In this respect, the family of simultaneous component methods (e.g., SUM-PCA, unrestricted PCovR, MFA, STATIS, and SCA-P) is a natural choice. Yet, different simultaneous component methods may lead to quite different results.</p> <p>Results</p> <p>We offer a structured overview of simultaneous component methods that frames them in a principal components setting such that both the common core of the methods and the specific elements with regard to which they differ are highlighted. An overview of principles is given that may guide the data analyst in choosing an appropriate simultaneous component method. Several theoretical and practical issues are illustrated with an empirical example on metabolomics data for <it>Escherichia coli </it>as obtained with different analytical chemical measurement methods.</p> <p>Conclusion</p> <p>Of the aspects in which the simultaneous component methods differ, pre-processing and weighting are consequential. Especially, the type of weighting of the different matrices is essential for simultaneous component analysis. These types are shown to be linked to different specifications of the idea of a fair integration of the different coupled arrays.</p>http://www.biomedcentral.com/1471-2105/10/246
spellingShingle van der Werf Mariët J
Smilde Age K
Van Deun Katrijn
Kiers Henk AL
Van Mechelen Iven
A structured overview of simultaneous component based data integration
BMC Bioinformatics
title A structured overview of simultaneous component based data integration
title_full A structured overview of simultaneous component based data integration
title_fullStr A structured overview of simultaneous component based data integration
title_full_unstemmed A structured overview of simultaneous component based data integration
title_short A structured overview of simultaneous component based data integration
title_sort structured overview of simultaneous component based data integration
url http://www.biomedcentral.com/1471-2105/10/246
work_keys_str_mv AT vanderwerfmarietj astructuredoverviewofsimultaneouscomponentbaseddataintegration
AT smildeagek astructuredoverviewofsimultaneouscomponentbaseddataintegration
AT vandeunkatrijn astructuredoverviewofsimultaneouscomponentbaseddataintegration
AT kiershenkal astructuredoverviewofsimultaneouscomponentbaseddataintegration
AT vanmecheleniven astructuredoverviewofsimultaneouscomponentbaseddataintegration
AT vanderwerfmarietj structuredoverviewofsimultaneouscomponentbaseddataintegration
AT smildeagek structuredoverviewofsimultaneouscomponentbaseddataintegration
AT vandeunkatrijn structuredoverviewofsimultaneouscomponentbaseddataintegration
AT kiershenkal structuredoverviewofsimultaneouscomponentbaseddataintegration
AT vanmecheleniven structuredoverviewofsimultaneouscomponentbaseddataintegration