Non-linear modeling of 1H NMR metabonomic data using kernel-based orthogonal projections to latent structures optimized by simulated annealing.

Linear multivariate projection methods are frequently applied for predictive modeling of spectroscopic data in metabonomic studies. The OPLS method is a commonly used computational procedure for characterizing spectral metabonomic data, largely due to its favorable model interpretation properties pr...

Fuld beskrivelse

Bibliografiske detaljer
Main Authors: Fonville, J, Bylesjö, M, Coen, M, Nicholson, J, Holmes, E, Lindon, J, Rantalainen, M
Format: Journal article
Sprog:English
Udgivet: 2011
Beskrivelse
Summary:Linear multivariate projection methods are frequently applied for predictive modeling of spectroscopic data in metabonomic studies. The OPLS method is a commonly used computational procedure for characterizing spectral metabonomic data, largely due to its favorable model interpretation properties providing separate descriptions of predictive variation and response-orthogonal structured noise. However, when the relationship between descriptor variables and the response is non-linear, conventional linear models will perform sub-optimally. In this study we have evaluated to what extent a non-linear model, kernel-based orthogonal projections to latent structures (K-OPLS), can provide enhanced predictive performance compared to the linear OPLS model. Just like its linear counterpart, K-OPLS provides separate model components for predictive variation and response-orthogonal structured noise. The improved model interpretation by this separate modeling is a property unique to K-OPLS in comparison to other kernel-based models. Simulated annealing (SA) was used for effective and automated optimization of the kernel-function parameter in K-OPLS (SA-K-OPLS). Our results reveal that the non-linear K-OPLS model provides improved prediction performance in three separate metabonomic data sets compared to the linear OPLS model. We also demonstrate how response-orthogonal K-OPLS components provide valuable biological interpretation of model and data. The metabonomic data sets were acquired using proton Nuclear Magnetic Resonance (NMR) spectroscopy, and include a study of the liver toxin galactosamine, a study of the nephrotoxin mercuric chloride and a study of Trypanosoma brucei brucei infection. Automated and user-friendly procedures for the kernel-optimization have been incorporated into version 1.1.1 of the freely available K-OPLS software package for both R and Matlab to enable easy application of K-OPLS for non-linear prediction modeling.