Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation

Microbiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc.). These data are generally regarded as compositional since the total number of...

Full description

Bibliographic Details
Main Authors: Michael Greenacre, Marina Martínez-Álvaro, Agustín Blasco
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-10-01
Series:Frontiers in Microbiology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmicb.2021.727398/full
_version_ 1818578259964592128
author Michael Greenacre
Marina Martínez-Álvaro
Agustín Blasco
author_facet Michael Greenacre
Marina Martínez-Álvaro
Agustín Blasco
author_sort Michael Greenacre
collection DOAJ
description Microbiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc.). These data are generally regarded as compositional since the total number of counts identified within a sample is irrelevant. The central concept in compositional data analysis is the logratio transformation, the simplest being the additive logratios with respect to a fixed reference component. A full set of additive logratios is not isometric, that is they do not reproduce the geometry of all pairwise logratios exactly, but their lack of isometry can be measured by the Procrustes correlation. The reference component can be chosen to maximize the Procrustes correlation between the additive logratio geometry and the exact logratio geometry, and for high-dimensional data there are many potential references. As a secondary criterion, minimizing the variance of the reference component's log-transformed relative abundance values makes the subsequent interpretation of the logratios even easier. On each of three high-dimensional omics datasets the additive logratio transformation was performed, using references that were identified according to the abovementioned criteria. For each dataset the compositional data structure was successfully reproduced, that is the additive logratios were very close to being isometric. The Procrustes correlations achieved for these datasets were 0.9991, 0.9974, and 0.9902, respectively. We thus demonstrate, for high-dimensional compositional data, that additive logratios can provide a valid choice as transformed variables, which (a) are subcompositionally coherent, (b) explain 100% of the total logratio variance and (c) come measurably very close to being isometric. The interpretation of additive logratios is much simpler than the complex isometric alternatives and, when the variance of the log-transformed reference is very low, it is even simpler since each additive logratio can be identified with a corresponding compositional component.
first_indexed 2024-12-16T06:42:58Z
format Article
id doaj.art-daea650d4f064235b910b7b5fb78495d
institution Directory Open Access Journal
issn 1664-302X
language English
last_indexed 2024-12-16T06:42:58Z
publishDate 2021-10-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Microbiology
spelling doaj.art-daea650d4f064235b910b7b5fb78495d2022-12-21T22:40:38ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2021-10-011210.3389/fmicb.2021.727398727398Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio TransformationMichael Greenacre0Marina Martínez-Álvaro1Agustín Blasco2Department of Economics and Business, Universitat Pompeu Fabra, Barcelona, SpainDepartment of Agriculture, Horticulture and Engineering Sciences, Scotland's Rural College, Edinburgh, United KingdomInstitute for Animal Science and Technology, Universitat Politècnica de València, València, SpainMicrobiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc.). These data are generally regarded as compositional since the total number of counts identified within a sample is irrelevant. The central concept in compositional data analysis is the logratio transformation, the simplest being the additive logratios with respect to a fixed reference component. A full set of additive logratios is not isometric, that is they do not reproduce the geometry of all pairwise logratios exactly, but their lack of isometry can be measured by the Procrustes correlation. The reference component can be chosen to maximize the Procrustes correlation between the additive logratio geometry and the exact logratio geometry, and for high-dimensional data there are many potential references. As a secondary criterion, minimizing the variance of the reference component's log-transformed relative abundance values makes the subsequent interpretation of the logratios even easier. On each of three high-dimensional omics datasets the additive logratio transformation was performed, using references that were identified according to the abovementioned criteria. For each dataset the compositional data structure was successfully reproduced, that is the additive logratios were very close to being isometric. The Procrustes correlations achieved for these datasets were 0.9991, 0.9974, and 0.9902, respectively. We thus demonstrate, for high-dimensional compositional data, that additive logratios can provide a valid choice as transformed variables, which (a) are subcompositionally coherent, (b) explain 100% of the total logratio variance and (c) come measurably very close to being isometric. The interpretation of additive logratios is much simpler than the complex isometric alternatives and, when the variance of the log-transformed reference is very low, it is even simpler since each additive logratio can be identified with a corresponding compositional component.https://www.frontiersin.org/articles/10.3389/fmicb.2021.727398/fullcompositional datadimension reductionlogratio transformationlogratio geometrylogratio varianceProcrustes correlation
spellingShingle Michael Greenacre
Marina Martínez-Álvaro
Agustín Blasco
Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation
Frontiers in Microbiology
compositional data
dimension reduction
logratio transformation
logratio geometry
logratio variance
Procrustes correlation
title Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation
title_full Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation
title_fullStr Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation
title_full_unstemmed Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation
title_short Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation
title_sort compositional data analysis of microbiome and any omics datasets a validation of the additive logratio transformation
topic compositional data
dimension reduction
logratio transformation
logratio geometry
logratio variance
Procrustes correlation
url https://www.frontiersin.org/articles/10.3389/fmicb.2021.727398/full
work_keys_str_mv AT michaelgreenacre compositionaldataanalysisofmicrobiomeandanyomicsdatasetsavalidationoftheadditivelogratiotransformation
AT marinamartinezalvaro compositionaldataanalysisofmicrobiomeandanyomicsdatasetsavalidationoftheadditivelogratiotransformation
AT agustinblasco compositionaldataanalysisofmicrobiomeandanyomicsdatasetsavalidationoftheadditivelogratiotransformation