A new pipeline for the normalization and pooling of metabolomics data

Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Sp...

Full description

Bibliographic Details
Main Authors: Viallon, V, His, M, Rinaldi, S, Breeur, M, Gicquiau, A, Hemon, B, Overvad, K, Tjønneland, A, Rostgaard-Hansen, AL, Rothwell, JA, Lecuyer, L, Severi, G, Kaaks, R, Johnson, T, Schulze, MB, Palli, D, Agnoli, C, Panico, S, Tumino, R, Ricceri, F, Verschuren, WMM, Engelfriet, P, Onland-Moret, C, Vermeulen, R, Nøst, TH, Urbarova, I, Zamora-Ros, R, Rodriguez-Barranco, M, Amiano, P, Huerta, JM, Ardanaz, E, Melander, O, Ottoson, F, Vidman, L, Rentoft, M, Schmidt, JA, Travis, RC, Weiderpass, E, Johansson, M, Dossus, L, Jenab, M, Gunter, MJ, Lorenzo Bermejo, J, Scherer, D, Salek, RM, Keski-Rahkonen, P, Ferrari, P
Format: Journal article
Language:English
Published: MDPI 2021
_version_ 1797062476256247808
author Viallon, V
His, M
Rinaldi, S
Breeur, M
Gicquiau, A
Hemon, B
Overvad, K
Tjønneland, A
Rostgaard-Hansen, AL
Rothwell, JA
Lecuyer, L
Severi, G
Kaaks, R
Johnson, T
Schulze, MB
Palli, D
Agnoli, C
Panico, S
Tumino, R
Ricceri, F
Verschuren, WMM
Engelfriet, P
Onland-Moret, C
Vermeulen, R
Nøst, TH
Urbarova, I
Zamora-Ros, R
Rodriguez-Barranco, M
Amiano, P
Huerta, JM
Ardanaz, E
Melander, O
Ottoson, F
Vidman, L
Rentoft, M
Schmidt, JA
Travis, RC
Weiderpass, E
Johansson, M
Dossus, L
Jenab, M
Gunter, MJ
Lorenzo Bermejo, J
Scherer, D
Salek, RM
Keski-Rahkonen, P
Ferrari, P
author_facet Viallon, V
His, M
Rinaldi, S
Breeur, M
Gicquiau, A
Hemon, B
Overvad, K
Tjønneland, A
Rostgaard-Hansen, AL
Rothwell, JA
Lecuyer, L
Severi, G
Kaaks, R
Johnson, T
Schulze, MB
Palli, D
Agnoli, C
Panico, S
Tumino, R
Ricceri, F
Verschuren, WMM
Engelfriet, P
Onland-Moret, C
Vermeulen, R
Nøst, TH
Urbarova, I
Zamora-Ros, R
Rodriguez-Barranco, M
Amiano, P
Huerta, JM
Ardanaz, E
Melander, O
Ottoson, F
Vidman, L
Rentoft, M
Schmidt, JA
Travis, RC
Weiderpass, E
Johansson, M
Dossus, L
Jenab, M
Gunter, MJ
Lorenzo Bermejo, J
Scherer, D
Salek, RM
Keski-Rahkonen, P
Ferrari, P
author_sort Viallon, V
collection OXFORD
description Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples’ originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.
first_indexed 2024-03-06T20:46:06Z
format Journal article
id oxford-uuid:35f6a23f-4ce8-4d9f-91c3-752ee83b37e3
institution University of Oxford
language English
last_indexed 2024-03-06T20:46:06Z
publishDate 2021
publisher MDPI
record_format dspace
spelling oxford-uuid:35f6a23f-4ce8-4d9f-91c3-752ee83b37e32022-03-26T13:34:59ZA new pipeline for the normalization and pooling of metabolomics dataJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:35f6a23f-4ce8-4d9f-91c3-752ee83b37e3EnglishSymplectic ElementsMDPI2021Viallon, VHis, MRinaldi, SBreeur, MGicquiau, AHemon, BOvervad, KTjønneland, ARostgaard-Hansen, ALRothwell, JALecuyer, LSeveri, GKaaks, RJohnson, TSchulze, MBPalli, DAgnoli, CPanico, STumino, RRicceri, FVerschuren, WMMEngelfriet, POnland-Moret, CVermeulen, RNøst, THUrbarova, IZamora-Ros, RRodriguez-Barranco, MAmiano, PHuerta, JMArdanaz, EMelander, OOttoson, FVidman, LRentoft, MSchmidt, JATravis, RCWeiderpass, EJohansson, MDossus, LJenab, MGunter, MJLorenzo Bermejo, JScherer, DSalek, RMKeski-Rahkonen, PFerrari, PPooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples’ originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.
spellingShingle Viallon, V
His, M
Rinaldi, S
Breeur, M
Gicquiau, A
Hemon, B
Overvad, K
Tjønneland, A
Rostgaard-Hansen, AL
Rothwell, JA
Lecuyer, L
Severi, G
Kaaks, R
Johnson, T
Schulze, MB
Palli, D
Agnoli, C
Panico, S
Tumino, R
Ricceri, F
Verschuren, WMM
Engelfriet, P
Onland-Moret, C
Vermeulen, R
Nøst, TH
Urbarova, I
Zamora-Ros, R
Rodriguez-Barranco, M
Amiano, P
Huerta, JM
Ardanaz, E
Melander, O
Ottoson, F
Vidman, L
Rentoft, M
Schmidt, JA
Travis, RC
Weiderpass, E
Johansson, M
Dossus, L
Jenab, M
Gunter, MJ
Lorenzo Bermejo, J
Scherer, D
Salek, RM
Keski-Rahkonen, P
Ferrari, P
A new pipeline for the normalization and pooling of metabolomics data
title A new pipeline for the normalization and pooling of metabolomics data
title_full A new pipeline for the normalization and pooling of metabolomics data
title_fullStr A new pipeline for the normalization and pooling of metabolomics data
title_full_unstemmed A new pipeline for the normalization and pooling of metabolomics data
title_short A new pipeline for the normalization and pooling of metabolomics data
title_sort new pipeline for the normalization and pooling of metabolomics data
work_keys_str_mv AT viallonv anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT hism anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rinaldis anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT breeurm anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT gicquiaua anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT hemonb anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT overvadk anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT tjønnelanda anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rostgaardhansenal anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rothwellja anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT lecuyerl anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT severig anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT kaaksr anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT johnsont anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT schulzemb anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT pallid anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT agnolic anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT panicos anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT tuminor anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT riccerif anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT verschurenwmm anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT engelfrietp anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT onlandmoretc anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT vermeulenr anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT nøstth anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT urbarovai anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT zamorarosr anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rodriguezbarrancom anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT amianop anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT huertajm anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT ardanaze anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT melandero anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT ottosonf anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT vidmanl anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rentoftm anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT schmidtja anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT travisrc anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT weiderpasse anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT johanssonm anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT dossusl anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT jenabm anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT guntermj anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT lorenzobermejoj anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT schererd anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT salekrm anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT keskirahkonenp anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT ferrarip anewpipelineforthenormalizationandpoolingofmetabolomicsdata
AT viallonv newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT hism newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rinaldis newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT breeurm newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT gicquiaua newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT hemonb newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT overvadk newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT tjønnelanda newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rostgaardhansenal newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rothwellja newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT lecuyerl newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT severig newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT kaaksr newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT johnsont newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT schulzemb newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT pallid newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT agnolic newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT panicos newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT tuminor newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT riccerif newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT verschurenwmm newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT engelfrietp newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT onlandmoretc newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT vermeulenr newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT nøstth newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT urbarovai newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT zamorarosr newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rodriguezbarrancom newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT amianop newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT huertajm newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT ardanaze newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT melandero newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT ottosonf newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT vidmanl newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT rentoftm newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT schmidtja newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT travisrc newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT weiderpasse newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT johanssonm newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT dossusl newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT jenabm newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT guntermj newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT lorenzobermejoj newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT schererd newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT salekrm newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT keskirahkonenp newpipelineforthenormalizationandpoolingofmetabolomicsdata
AT ferrarip newpipelineforthenormalizationandpoolingofmetabolomicsdata