What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions

In the scientific literature data analysis results are often presented when samples from different experiments or different conditions, technical replicates or times series are merged to increase the sample size before calculating the correlation coefficient. This way of proceeding violates two basi...

Full description

Bibliographic Details
Main Author:	Edoardo Saccenti
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2023-01-01
Series:	Frontiers in Systems Biology
Subjects:	covariance data fusion data merging pearson’s correlation repeated measures spearman’s correlation
Online Access:	https://www.frontiersin.org/articles/10.3389/fsysb.2023.1042156/full

_version_	1811175335460339712
author	Edoardo Saccenti
author_facet	Edoardo Saccenti
author_sort	Edoardo Saccenti
collection	DOAJ
description	In the scientific literature data analysis results are often presented when samples from different experiments or different conditions, technical replicates or times series are merged to increase the sample size before calculating the correlation coefficient. This way of proceeding violates two basic assumptions underlying the use of the correlation coefficient: sampling from one population and independence of the observations (independence of errors). Since correlations are used to measure and infer associations between biological entities, this has tremendous implications on the reliability of scientific results, as the violation of these assumption leads to wrong and biased results. In this technical note, I review some basic properties of the Pearson’s correlation coefficient and illustrate some exemplary problems with simulated and experimental data, taking a didactic approach with the use of supporting graphical examples.
first_indexed	2024-04-10T19:35:28Z
format	Article
id	doaj.art-4e40a63947d24603a4739a8bc9d99365
institution	Directory Open Access Journal
issn	2674-0702
language	English
last_indexed	2024-04-10T19:35:28Z
publishDate	2023-01-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Systems Biology
spelling	doaj.art-4e40a63947d24603a4739a8bc9d993652023-01-30T07:44:34ZengFrontiers Media S.A.Frontiers in Systems Biology2674-07022023-01-01310.3389/fsysb.2023.10421561042156What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditionsEdoardo SaccentiIn the scientific literature data analysis results are often presented when samples from different experiments or different conditions, technical replicates or times series are merged to increase the sample size before calculating the correlation coefficient. This way of proceeding violates two basic assumptions underlying the use of the correlation coefficient: sampling from one population and independence of the observations (independence of errors). Since correlations are used to measure and infer associations between biological entities, this has tremendous implications on the reliability of scientific results, as the violation of these assumption leads to wrong and biased results. In this technical note, I review some basic properties of the Pearson’s correlation coefficient and illustrate some exemplary problems with simulated and experimental data, taking a didactic approach with the use of supporting graphical examples.https://www.frontiersin.org/articles/10.3389/fsysb.2023.1042156/fullcovariancedata fusiondata mergingpearson’s correlationrepeated measuresspearman’s correlation
spellingShingle	Edoardo Saccenti What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions Frontiers in Systems Biology covariance data fusion data merging pearson’s correlation repeated measures spearman’s correlation
title	What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions
title_full	What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions
title_fullStr	What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions
title_full_unstemmed	What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions
title_short	What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions
title_sort	what can go wrong when observations are not independently and identically distributed a cautionary note on calculating correlations on combined data sets from different experiments or conditions
topic	covariance data fusion data merging pearson’s correlation repeated measures spearman’s correlation
url	https://www.frontiersin.org/articles/10.3389/fsysb.2023.1042156/full
work_keys_str_mv	AT edoardosaccenti whatcangowrongwhenobservationsarenotindependentlyandidenticallydistributedacautionarynoteoncalculatingcorrelationsoncombineddatasetsfromdifferentexperimentsorconditions

What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions

Similar Items