Summary: | The exposome recognizes that individuals are exposed simultaneously to a multitude of different environmental factors and takes a holistic approach to the discovery of etiological factors for disease. However, challenges arise when trying to quantify the health effects of complex exposure mixtures. Analytical challenges include dealing with high dimensionality, studying the combined effects of these exposures and their interactions, integrating causal pathways, and integrating high-throughput omics layers. To tackle these challenges, the Barcelona Institute for Global Health (ISGlobal) held a data challenge event open to researchers from all over the world and from all expertises. Analysts had a chance to compete and apply state-of-the-art methods on a common partially simulated exposome dataset (based on real case data from the HELIX project) with multiple correlated exposure variables (P > 100 exposure variables) arising from general and personal environments at different time points, biological molecular data (multi-omics: DNA methylation, gene expression, proteins, metabolomics) and multiple clinical phenotypes in 1301 mother–child pairs. Most of the methods presented included feature selection or feature reduction to deal with the high dimensionality of the exposome dataset. Several approaches explicitly searched for combined effects of exposures and/or their interactions using linear index models or response surface methods, including Bayesian methods. Other methods dealt with the multi-omics dataset in mediation analyses using multiple-step approaches. Here we discuss features of the statistical models used and provide the data and codes used, so that analysts have examples of implementation and can learn how to use these methods. Overall, the exposome data challenge presented a unique opportunity for researchers from different disciplines to create and share state-of-the-art analytical methods, setting a new standard for open science in the exposome and environmental health field.
|