Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation
Microbiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc.). These data are generally regarded as compositional since the total number of...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-10-01
|
Series: | Frontiers in Microbiology |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fmicb.2021.727398/full |
_version_ | 1818578259964592128 |
---|---|
author | Michael Greenacre Marina Martínez-Álvaro Agustín Blasco |
author_facet | Michael Greenacre Marina Martínez-Álvaro Agustín Blasco |
author_sort | Michael Greenacre |
collection | DOAJ |
description | Microbiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc.). These data are generally regarded as compositional since the total number of counts identified within a sample is irrelevant. The central concept in compositional data analysis is the logratio transformation, the simplest being the additive logratios with respect to a fixed reference component. A full set of additive logratios is not isometric, that is they do not reproduce the geometry of all pairwise logratios exactly, but their lack of isometry can be measured by the Procrustes correlation. The reference component can be chosen to maximize the Procrustes correlation between the additive logratio geometry and the exact logratio geometry, and for high-dimensional data there are many potential references. As a secondary criterion, minimizing the variance of the reference component's log-transformed relative abundance values makes the subsequent interpretation of the logratios even easier. On each of three high-dimensional omics datasets the additive logratio transformation was performed, using references that were identified according to the abovementioned criteria. For each dataset the compositional data structure was successfully reproduced, that is the additive logratios were very close to being isometric. The Procrustes correlations achieved for these datasets were 0.9991, 0.9974, and 0.9902, respectively. We thus demonstrate, for high-dimensional compositional data, that additive logratios can provide a valid choice as transformed variables, which (a) are subcompositionally coherent, (b) explain 100% of the total logratio variance and (c) come measurably very close to being isometric. The interpretation of additive logratios is much simpler than the complex isometric alternatives and, when the variance of the log-transformed reference is very low, it is even simpler since each additive logratio can be identified with a corresponding compositional component. |
first_indexed | 2024-12-16T06:42:58Z |
format | Article |
id | doaj.art-daea650d4f064235b910b7b5fb78495d |
institution | Directory Open Access Journal |
issn | 1664-302X |
language | English |
last_indexed | 2024-12-16T06:42:58Z |
publishDate | 2021-10-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Microbiology |
spelling | doaj.art-daea650d4f064235b910b7b5fb78495d2022-12-21T22:40:38ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2021-10-011210.3389/fmicb.2021.727398727398Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio TransformationMichael Greenacre0Marina Martínez-Álvaro1Agustín Blasco2Department of Economics and Business, Universitat Pompeu Fabra, Barcelona, SpainDepartment of Agriculture, Horticulture and Engineering Sciences, Scotland's Rural College, Edinburgh, United KingdomInstitute for Animal Science and Technology, Universitat Politècnica de València, València, SpainMicrobiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc.). These data are generally regarded as compositional since the total number of counts identified within a sample is irrelevant. The central concept in compositional data analysis is the logratio transformation, the simplest being the additive logratios with respect to a fixed reference component. A full set of additive logratios is not isometric, that is they do not reproduce the geometry of all pairwise logratios exactly, but their lack of isometry can be measured by the Procrustes correlation. The reference component can be chosen to maximize the Procrustes correlation between the additive logratio geometry and the exact logratio geometry, and for high-dimensional data there are many potential references. As a secondary criterion, minimizing the variance of the reference component's log-transformed relative abundance values makes the subsequent interpretation of the logratios even easier. On each of three high-dimensional omics datasets the additive logratio transformation was performed, using references that were identified according to the abovementioned criteria. For each dataset the compositional data structure was successfully reproduced, that is the additive logratios were very close to being isometric. The Procrustes correlations achieved for these datasets were 0.9991, 0.9974, and 0.9902, respectively. We thus demonstrate, for high-dimensional compositional data, that additive logratios can provide a valid choice as transformed variables, which (a) are subcompositionally coherent, (b) explain 100% of the total logratio variance and (c) come measurably very close to being isometric. The interpretation of additive logratios is much simpler than the complex isometric alternatives and, when the variance of the log-transformed reference is very low, it is even simpler since each additive logratio can be identified with a corresponding compositional component.https://www.frontiersin.org/articles/10.3389/fmicb.2021.727398/fullcompositional datadimension reductionlogratio transformationlogratio geometrylogratio varianceProcrustes correlation |
spellingShingle | Michael Greenacre Marina Martínez-Álvaro Agustín Blasco Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation Frontiers in Microbiology compositional data dimension reduction logratio transformation logratio geometry logratio variance Procrustes correlation |
title | Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation |
title_full | Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation |
title_fullStr | Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation |
title_full_unstemmed | Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation |
title_short | Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation |
title_sort | compositional data analysis of microbiome and any omics datasets a validation of the additive logratio transformation |
topic | compositional data dimension reduction logratio transformation logratio geometry logratio variance Procrustes correlation |
url | https://www.frontiersin.org/articles/10.3389/fmicb.2021.727398/full |
work_keys_str_mv | AT michaelgreenacre compositionaldataanalysisofmicrobiomeandanyomicsdatasetsavalidationoftheadditivelogratiotransformation AT marinamartinezalvaro compositionaldataanalysisofmicrobiomeandanyomicsdatasetsavalidationoftheadditivelogratiotransformation AT agustinblasco compositionaldataanalysisofmicrobiomeandanyomicsdatasetsavalidationoftheadditivelogratiotransformation |