pulver: an R package for parallel ultra-rapid p-value computation for linear regression interaction terms

Abstract Background Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only consid...

Full description

Bibliographic Details
Main Authors:	Sophie Molnos, Clemens Baumbach, Simone Wahl, Martina Müller-Nurasyid, Konstantin Strauch, Rui Wang-Sattler, Melanie Waldenberger, Thomas Meitinger, Jerzy Adamski, Gabi Kastenmüller, Karsten Suhre, Annette Peters, Harald Grallert, Fabian J. Theis, Christian Gieger
Format:	Article
Language:	English
Published:	BMC 2017-09-01
Series:	BMC Bioinformatics
Subjects:	Algorithm Linear regression interaction term SNP–CpG interaction Software
Online Access:	http://link.springer.com/article/10.1186/s12859-017-1838-y

_version_	1819145192679145472
author	Sophie Molnos Clemens Baumbach Simone Wahl Martina Müller-Nurasyid Konstantin Strauch Rui Wang-Sattler Melanie Waldenberger Thomas Meitinger Jerzy Adamski Gabi Kastenmüller Karsten Suhre Annette Peters Harald Grallert Fabian J. Theis Christian Gieger
author_facet	Sophie Molnos Clemens Baumbach Simone Wahl Martina Müller-Nurasyid Konstantin Strauch Rui Wang-Sattler Melanie Waldenberger Thomas Meitinger Jerzy Adamski Gabi Kastenmüller Karsten Suhre Annette Peters Harald Grallert Fabian J. Theis Christian Gieger
author_sort	Sophie Molnos
collection	DOAJ
description	Abstract Background Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only considering genetic variants and their effects on one trait ignores the possible interplay between different “omics” layers. Existing tools only consider single-nucleotide polymorphism (SNP)–SNP interactions, and no practical tool is available for large-scale investigations of the interactions between pairs of arbitrary quantitative variables. Results We developed an R package called pulver to compute p-values for the interaction term in a very large number of linear regression models. Comparisons based on simulated data showed that pulver is much faster than the existing tools. This is achieved by using the correlation coefficient to test the null-hypothesis, which avoids the costly computation of inversions. Additional tricks are a rearrangement of the order, when iterating through the different “omics” layers, and implementing this algorithm in the fast programming language C++. Furthermore, we applied our algorithm to data from the German KORA study to investigate a real-world problem involving the interplay among DNA methylation, genetic variants, and metabolite levels. Conclusions The pulver package is a convenient and rapid tool for screening huge numbers of linear regression models for significant interaction terms in arbitrary pairs of quantitative variables. pulver is written in R and C++, and can be downloaded freely from CRAN at https://cran.r-project.org/web/packages/pulver/ .
first_indexed	2024-12-22T12:54:08Z
format	Article
id	doaj.art-5050b8deacdc406baf1a8ec8049548a9
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-22T12:54:08Z
publishDate	2017-09-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-5050b8deacdc406baf1a8ec8049548a92022-12-21T18:25:10ZengBMCBMC Bioinformatics1471-21052017-09-011811810.1186/s12859-017-1838-ypulver: an R package for parallel ultra-rapid p-value computation for linear regression interaction termsSophie Molnos0Clemens Baumbach1Simone Wahl2Martina Müller-Nurasyid3Konstantin Strauch4Rui Wang-Sattler5Melanie Waldenberger6Thomas Meitinger7Jerzy Adamski8Gabi Kastenmüller9Karsten Suhre10Annette Peters11Harald Grallert12Fabian J. Theis13Christian Gieger14Research Unit of Molecular Epidemiology, Helmholtz Zentrum MünchenResearch Unit of Molecular Epidemiology, Helmholtz Zentrum MünchenResearch Unit of Molecular Epidemiology, Helmholtz Zentrum MünchenDepartment of Medicine I, University Hospital Grosshadern, Ludwig-Maximilians-UniversitätInstitute of Genetic Epidemiology, Helmholtz Zentrum MünchenResearch Unit of Molecular Epidemiology, Helmholtz Zentrum MünchenResearch Unit of Molecular Epidemiology, Helmholtz Zentrum MünchenInstitute of Human Genetics, Helmholtz Zentrum MünchenGerman Center for Diabetes Research (DZD)Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum MünchenInstitute of Bioinformatics and Systems Biology, Helmholtz Zentrum MünchenResearch Unit of Molecular Epidemiology, Helmholtz Zentrum MünchenResearch Unit of Molecular Epidemiology, Helmholtz Zentrum MünchenInstitute of Computational Biology, Helmholtz Zentrum MünchenResearch Unit of Molecular Epidemiology, Helmholtz Zentrum MünchenAbstract Background Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only considering genetic variants and their effects on one trait ignores the possible interplay between different “omics” layers. Existing tools only consider single-nucleotide polymorphism (SNP)–SNP interactions, and no practical tool is available for large-scale investigations of the interactions between pairs of arbitrary quantitative variables. Results We developed an R package called pulver to compute p-values for the interaction term in a very large number of linear regression models. Comparisons based on simulated data showed that pulver is much faster than the existing tools. This is achieved by using the correlation coefficient to test the null-hypothesis, which avoids the costly computation of inversions. Additional tricks are a rearrangement of the order, when iterating through the different “omics” layers, and implementing this algorithm in the fast programming language C++. Furthermore, we applied our algorithm to data from the German KORA study to investigate a real-world problem involving the interplay among DNA methylation, genetic variants, and metabolite levels. Conclusions The pulver package is a convenient and rapid tool for screening huge numbers of linear regression models for significant interaction terms in arbitrary pairs of quantitative variables. pulver is written in R and C++, and can be downloaded freely from CRAN at https://cran.r-project.org/web/packages/pulver/ .http://link.springer.com/article/10.1186/s12859-017-1838-yAlgorithmLinear regression interaction termSNP–CpG interactionSoftware
spellingShingle	Sophie Molnos Clemens Baumbach Simone Wahl Martina Müller-Nurasyid Konstantin Strauch Rui Wang-Sattler Melanie Waldenberger Thomas Meitinger Jerzy Adamski Gabi Kastenmüller Karsten Suhre Annette Peters Harald Grallert Fabian J. Theis Christian Gieger pulver: an R package for parallel ultra-rapid p-value computation for linear regression interaction terms BMC Bioinformatics Algorithm Linear regression interaction term SNP–CpG interaction Software
title	pulver: an R package for parallel ultra-rapid p-value computation for linear regression interaction terms
title_full	pulver: an R package for parallel ultra-rapid p-value computation for linear regression interaction terms
title_fullStr	pulver: an R package for parallel ultra-rapid p-value computation for linear regression interaction terms
title_full_unstemmed	pulver: an R package for parallel ultra-rapid p-value computation for linear regression interaction terms
title_short	pulver: an R package for parallel ultra-rapid p-value computation for linear regression interaction terms
title_sort	pulver an r package for parallel ultra rapid p value computation for linear regression interaction terms
topic	Algorithm Linear regression interaction term SNP–CpG interaction Software
url	http://link.springer.com/article/10.1186/s12859-017-1838-y
work_keys_str_mv	AT sophiemolnos pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms AT clemensbaumbach pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms AT simonewahl pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms AT martinamullernurasyid pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms AT konstantinstrauch pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms AT ruiwangsattler pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms AT melaniewaldenberger pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms AT thomasmeitinger pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms AT jerzyadamski pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms AT gabikastenmuller pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms AT karstensuhre pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms AT annettepeters pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms AT haraldgrallert pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms AT fabianjtheis pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms AT christiangieger pulveranrpackageforparallelultrarapidpvaluecomputationforlinearregressioninteractionterms

pulver: an R package for parallel ultra-rapid p-value computation for linear regression interaction terms

Similar Items