What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions
In the scientific literature data analysis results are often presented when samples from different experiments or different conditions, technical replicates or times series are merged to increase the sample size before calculating the correlation coefficient. This way of proceeding violates two basi...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2023-01-01
|
Series: | Frontiers in Systems Biology |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fsysb.2023.1042156/full |
_version_ | 1811175335460339712 |
---|---|
author | Edoardo Saccenti |
author_facet | Edoardo Saccenti |
author_sort | Edoardo Saccenti |
collection | DOAJ |
description | In the scientific literature data analysis results are often presented when samples from different experiments or different conditions, technical replicates or times series are merged to increase the sample size before calculating the correlation coefficient. This way of proceeding violates two basic assumptions underlying the use of the correlation coefficient: sampling from one population and independence of the observations (independence of errors). Since correlations are used to measure and infer associations between biological entities, this has tremendous implications on the reliability of scientific results, as the violation of these assumption leads to wrong and biased results. In this technical note, I review some basic properties of the Pearson’s correlation coefficient and illustrate some exemplary problems with simulated and experimental data, taking a didactic approach with the use of supporting graphical examples. |
first_indexed | 2024-04-10T19:35:28Z |
format | Article |
id | doaj.art-4e40a63947d24603a4739a8bc9d99365 |
institution | Directory Open Access Journal |
issn | 2674-0702 |
language | English |
last_indexed | 2024-04-10T19:35:28Z |
publishDate | 2023-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Systems Biology |
spelling | doaj.art-4e40a63947d24603a4739a8bc9d993652023-01-30T07:44:34ZengFrontiers Media S.A.Frontiers in Systems Biology2674-07022023-01-01310.3389/fsysb.2023.10421561042156What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditionsEdoardo SaccentiIn the scientific literature data analysis results are often presented when samples from different experiments or different conditions, technical replicates or times series are merged to increase the sample size before calculating the correlation coefficient. This way of proceeding violates two basic assumptions underlying the use of the correlation coefficient: sampling from one population and independence of the observations (independence of errors). Since correlations are used to measure and infer associations between biological entities, this has tremendous implications on the reliability of scientific results, as the violation of these assumption leads to wrong and biased results. In this technical note, I review some basic properties of the Pearson’s correlation coefficient and illustrate some exemplary problems with simulated and experimental data, taking a didactic approach with the use of supporting graphical examples.https://www.frontiersin.org/articles/10.3389/fsysb.2023.1042156/fullcovariancedata fusiondata mergingpearson’s correlationrepeated measuresspearman’s correlation |
spellingShingle | Edoardo Saccenti What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions Frontiers in Systems Biology covariance data fusion data merging pearson’s correlation repeated measures spearman’s correlation |
title | What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions |
title_full | What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions |
title_fullStr | What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions |
title_full_unstemmed | What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions |
title_short | What can go wrong when observations are not independently and identically distributed: A cautionary note on calculating correlations on combined data sets from different experiments or conditions |
title_sort | what can go wrong when observations are not independently and identically distributed a cautionary note on calculating correlations on combined data sets from different experiments or conditions |
topic | covariance data fusion data merging pearson’s correlation repeated measures spearman’s correlation |
url | https://www.frontiersin.org/articles/10.3389/fsysb.2023.1042156/full |
work_keys_str_mv | AT edoardosaccenti whatcangowrongwhenobservationsarenotindependentlyandidenticallydistributedacautionarynoteoncalculatingcorrelationsoncombineddatasetsfromdifferentexperimentsorconditions |