A new pipeline for the normalization and pooling of metabolomics data
Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Sp...
Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Journal article |
Language: | English |
Published: |
MDPI
2021
|
_version_ | 1826266909671161856 |
---|---|
author | Viallon, V His, M Rinaldi, S Breeur, M Gicquiau, A Hemon, B Overvad, K Tjønneland, A Rostgaard-Hansen, AL Rothwell, JA Lecuyer, L Severi, G Kaaks, R Johnson, T Schulze, MB Palli, D Agnoli, C Panico, S Tumino, R Ricceri, F Verschuren, WMM Engelfriet, P Onland-Moret, C Vermeulen, R Nøst, TH Urbarova, I Zamora-Ros, R Rodriguez-Barranco, M Amiano, P Huerta, JM Ardanaz, E Melander, O Ottoson, F Vidman, L Rentoft, M Schmidt, JA Travis, RC Weiderpass, E Johansson, M Dossus, L Jenab, M Gunter, MJ Lorenzo Bermejo, J Scherer, D Salek, RM Keski-Rahkonen, P Ferrari, P |
author_facet | Viallon, V His, M Rinaldi, S Breeur, M Gicquiau, A Hemon, B Overvad, K Tjønneland, A Rostgaard-Hansen, AL Rothwell, JA Lecuyer, L Severi, G Kaaks, R Johnson, T Schulze, MB Palli, D Agnoli, C Panico, S Tumino, R Ricceri, F Verschuren, WMM Engelfriet, P Onland-Moret, C Vermeulen, R Nøst, TH Urbarova, I Zamora-Ros, R Rodriguez-Barranco, M Amiano, P Huerta, JM Ardanaz, E Melander, O Ottoson, F Vidman, L Rentoft, M Schmidt, JA Travis, RC Weiderpass, E Johansson, M Dossus, L Jenab, M Gunter, MJ Lorenzo Bermejo, J Scherer, D Salek, RM Keski-Rahkonen, P Ferrari, P |
author_sort | Viallon, V |
collection | OXFORD |
description | Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples’ originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists. |
first_indexed | 2024-03-06T20:46:06Z |
format | Journal article |
id | oxford-uuid:35f6a23f-4ce8-4d9f-91c3-752ee83b37e3 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-06T20:46:06Z |
publishDate | 2021 |
publisher | MDPI |
record_format | dspace |
spelling | oxford-uuid:35f6a23f-4ce8-4d9f-91c3-752ee83b37e32022-03-26T13:34:59ZA new pipeline for the normalization and pooling of metabolomics dataJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:35f6a23f-4ce8-4d9f-91c3-752ee83b37e3EnglishSymplectic ElementsMDPI2021Viallon, VHis, MRinaldi, SBreeur, MGicquiau, AHemon, BOvervad, KTjønneland, ARostgaard-Hansen, ALRothwell, JALecuyer, LSeveri, GKaaks, RJohnson, TSchulze, MBPalli, DAgnoli, CPanico, STumino, RRicceri, FVerschuren, WMMEngelfriet, POnland-Moret, CVermeulen, RNøst, THUrbarova, IZamora-Ros, RRodriguez-Barranco, MAmiano, PHuerta, JMArdanaz, EMelander, OOttoson, FVidman, LRentoft, MSchmidt, JATravis, RCWeiderpass, EJohansson, MDossus, LJenab, MGunter, MJLorenzo Bermejo, JScherer, DSalek, RMKeski-Rahkonen, PFerrari, PPooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples’ originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists. |
spellingShingle | Viallon, V His, M Rinaldi, S Breeur, M Gicquiau, A Hemon, B Overvad, K Tjønneland, A Rostgaard-Hansen, AL Rothwell, JA Lecuyer, L Severi, G Kaaks, R Johnson, T Schulze, MB Palli, D Agnoli, C Panico, S Tumino, R Ricceri, F Verschuren, WMM Engelfriet, P Onland-Moret, C Vermeulen, R Nøst, TH Urbarova, I Zamora-Ros, R Rodriguez-Barranco, M Amiano, P Huerta, JM Ardanaz, E Melander, O Ottoson, F Vidman, L Rentoft, M Schmidt, JA Travis, RC Weiderpass, E Johansson, M Dossus, L Jenab, M Gunter, MJ Lorenzo Bermejo, J Scherer, D Salek, RM Keski-Rahkonen, P Ferrari, P A new pipeline for the normalization and pooling of metabolomics data |
title | A new pipeline for the normalization and pooling of metabolomics data |
title_full | A new pipeline for the normalization and pooling of metabolomics data |
title_fullStr | A new pipeline for the normalization and pooling of metabolomics data |
title_full_unstemmed | A new pipeline for the normalization and pooling of metabolomics data |
title_short | A new pipeline for the normalization and pooling of metabolomics data |
title_sort | new pipeline for the normalization and pooling of metabolomics data |
work_keys_str_mv | AT viallonv anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT hism anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT rinaldis anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT breeurm anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT gicquiaua anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT hemonb anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT overvadk anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT tjønnelanda anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT rostgaardhansenal anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT rothwellja anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT lecuyerl anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT severig anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT kaaksr anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT johnsont anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT schulzemb anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT pallid anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT agnolic anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT panicos anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT tuminor anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT riccerif anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT verschurenwmm anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT engelfrietp anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT onlandmoretc anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT vermeulenr anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT nøstth anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT urbarovai anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT zamorarosr anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT rodriguezbarrancom anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT amianop anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT huertajm anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT ardanaze anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT melandero anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT ottosonf anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT vidmanl anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT rentoftm anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT schmidtja anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT travisrc anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT weiderpasse anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT johanssonm anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT dossusl anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT jenabm anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT guntermj anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT lorenzobermejoj anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT schererd anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT salekrm anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT keskirahkonenp anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT ferrarip anewpipelineforthenormalizationandpoolingofmetabolomicsdata AT viallonv newpipelineforthenormalizationandpoolingofmetabolomicsdata AT hism newpipelineforthenormalizationandpoolingofmetabolomicsdata AT rinaldis newpipelineforthenormalizationandpoolingofmetabolomicsdata AT breeurm newpipelineforthenormalizationandpoolingofmetabolomicsdata AT gicquiaua newpipelineforthenormalizationandpoolingofmetabolomicsdata AT hemonb newpipelineforthenormalizationandpoolingofmetabolomicsdata AT overvadk newpipelineforthenormalizationandpoolingofmetabolomicsdata AT tjønnelanda newpipelineforthenormalizationandpoolingofmetabolomicsdata AT rostgaardhansenal newpipelineforthenormalizationandpoolingofmetabolomicsdata AT rothwellja newpipelineforthenormalizationandpoolingofmetabolomicsdata AT lecuyerl newpipelineforthenormalizationandpoolingofmetabolomicsdata AT severig newpipelineforthenormalizationandpoolingofmetabolomicsdata AT kaaksr newpipelineforthenormalizationandpoolingofmetabolomicsdata AT johnsont newpipelineforthenormalizationandpoolingofmetabolomicsdata AT schulzemb newpipelineforthenormalizationandpoolingofmetabolomicsdata AT pallid newpipelineforthenormalizationandpoolingofmetabolomicsdata AT agnolic newpipelineforthenormalizationandpoolingofmetabolomicsdata AT panicos newpipelineforthenormalizationandpoolingofmetabolomicsdata AT tuminor newpipelineforthenormalizationandpoolingofmetabolomicsdata AT riccerif newpipelineforthenormalizationandpoolingofmetabolomicsdata AT verschurenwmm newpipelineforthenormalizationandpoolingofmetabolomicsdata AT engelfrietp newpipelineforthenormalizationandpoolingofmetabolomicsdata AT onlandmoretc newpipelineforthenormalizationandpoolingofmetabolomicsdata AT vermeulenr newpipelineforthenormalizationandpoolingofmetabolomicsdata AT nøstth newpipelineforthenormalizationandpoolingofmetabolomicsdata AT urbarovai newpipelineforthenormalizationandpoolingofmetabolomicsdata AT zamorarosr newpipelineforthenormalizationandpoolingofmetabolomicsdata AT rodriguezbarrancom newpipelineforthenormalizationandpoolingofmetabolomicsdata AT amianop newpipelineforthenormalizationandpoolingofmetabolomicsdata AT huertajm newpipelineforthenormalizationandpoolingofmetabolomicsdata AT ardanaze newpipelineforthenormalizationandpoolingofmetabolomicsdata AT melandero newpipelineforthenormalizationandpoolingofmetabolomicsdata AT ottosonf newpipelineforthenormalizationandpoolingofmetabolomicsdata AT vidmanl newpipelineforthenormalizationandpoolingofmetabolomicsdata AT rentoftm newpipelineforthenormalizationandpoolingofmetabolomicsdata AT schmidtja newpipelineforthenormalizationandpoolingofmetabolomicsdata AT travisrc newpipelineforthenormalizationandpoolingofmetabolomicsdata AT weiderpasse newpipelineforthenormalizationandpoolingofmetabolomicsdata AT johanssonm newpipelineforthenormalizationandpoolingofmetabolomicsdata AT dossusl newpipelineforthenormalizationandpoolingofmetabolomicsdata AT jenabm newpipelineforthenormalizationandpoolingofmetabolomicsdata AT guntermj newpipelineforthenormalizationandpoolingofmetabolomicsdata AT lorenzobermejoj newpipelineforthenormalizationandpoolingofmetabolomicsdata AT schererd newpipelineforthenormalizationandpoolingofmetabolomicsdata AT salekrm newpipelineforthenormalizationandpoolingofmetabolomicsdata AT keskirahkonenp newpipelineforthenormalizationandpoolingofmetabolomicsdata AT ferrarip newpipelineforthenormalizationandpoolingofmetabolomicsdata |