Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks.

<h4>Background</h4>Approaching epidemiological data with flexible machine learning algorithms is of great value for understanding disease-specific association patterns. However, it can be difficult to correctly extract and understand those patterns due to the lack of model interpretabili...

Full description

Bibliographic Details
Main Authors: Ann-Kristin Becker, Till Ittermann, Markus Dörr, Stephan B Felix, Matthias Nauck, Alexander Teumer, Uwe Völker, Henry Völzke, Lars Kaderali, Neetika Nath
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0271610
_version_ 1829099012344512512
author Ann-Kristin Becker
Till Ittermann
Markus Dörr
Stephan B Felix
Matthias Nauck
Alexander Teumer
Uwe Völker
Henry Völzke
Lars Kaderali
Neetika Nath
author_facet Ann-Kristin Becker
Till Ittermann
Markus Dörr
Stephan B Felix
Matthias Nauck
Alexander Teumer
Uwe Völker
Henry Völzke
Lars Kaderali
Neetika Nath
author_sort Ann-Kristin Becker
collection DOAJ
description <h4>Background</h4>Approaching epidemiological data with flexible machine learning algorithms is of great value for understanding disease-specific association patterns. However, it can be difficult to correctly extract and understand those patterns due to the lack of model interpretability.<h4>Method</h4>We here propose a machine learning workflow that combines random forests with Bayesian network surrogate models to allow for a deeper level of interpretation of complex association patterns. We first evaluate the proposed workflow on synthetic data. We then apply it to data from the large population-based Study of Health in Pomerania (SHIP). Based on this combination, we discover and interpret broad patterns of individual serum TSH concentrations, an important marker of thyroid functionality.<h4>Results</h4>Evaluations using simulated data show that feature associations can be correctly recovered by combining random forests and Bayesian networks. The presented model achieves predictive accuracy that is similar to state-of-the-art models (root mean square error of 0.66, mean absolute error of 0.55, coefficient of determination of R2 = 0.15). We identify 62 relevant features from the final random forest model, ranging from general health variables over dietary and genetic factors to physiological, hematological and hemostasis parameters. The Bayesian network model is used to put these features into context and make the black-box random forest model more understandable.<h4>Conclusion</h4>We demonstrate that the combination of random forest and Bayesian network analysis is helpful to reveal and interpret broad association patterns of individual TSH concentrations. The discovered patterns are in line with state-of-the-art literature. They may be useful for future thyroid research and improved dosing of therapeutics.
first_indexed 2024-12-10T21:17:36Z
format Article
id doaj.art-14f023515ef44921a368174a33428a13
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-10T21:17:36Z
publishDate 2022-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-14f023515ef44921a368174a33428a132022-12-22T01:33:14ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-01177e027161010.1371/journal.pone.0271610Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks.Ann-Kristin BeckerTill IttermannMarkus DörrStephan B FelixMatthias NauckAlexander TeumerUwe VölkerHenry VölzkeLars KaderaliNeetika Nath<h4>Background</h4>Approaching epidemiological data with flexible machine learning algorithms is of great value for understanding disease-specific association patterns. However, it can be difficult to correctly extract and understand those patterns due to the lack of model interpretability.<h4>Method</h4>We here propose a machine learning workflow that combines random forests with Bayesian network surrogate models to allow for a deeper level of interpretation of complex association patterns. We first evaluate the proposed workflow on synthetic data. We then apply it to data from the large population-based Study of Health in Pomerania (SHIP). Based on this combination, we discover and interpret broad patterns of individual serum TSH concentrations, an important marker of thyroid functionality.<h4>Results</h4>Evaluations using simulated data show that feature associations can be correctly recovered by combining random forests and Bayesian networks. The presented model achieves predictive accuracy that is similar to state-of-the-art models (root mean square error of 0.66, mean absolute error of 0.55, coefficient of determination of R2 = 0.15). We identify 62 relevant features from the final random forest model, ranging from general health variables over dietary and genetic factors to physiological, hematological and hemostasis parameters. The Bayesian network model is used to put these features into context and make the black-box random forest model more understandable.<h4>Conclusion</h4>We demonstrate that the combination of random forest and Bayesian network analysis is helpful to reveal and interpret broad association patterns of individual TSH concentrations. The discovered patterns are in line with state-of-the-art literature. They may be useful for future thyroid research and improved dosing of therapeutics.https://doi.org/10.1371/journal.pone.0271610
spellingShingle Ann-Kristin Becker
Till Ittermann
Markus Dörr
Stephan B Felix
Matthias Nauck
Alexander Teumer
Uwe Völker
Henry Völzke
Lars Kaderali
Neetika Nath
Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks.
PLoS ONE
title Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks.
title_full Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks.
title_fullStr Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks.
title_full_unstemmed Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks.
title_short Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks.
title_sort analysis of epidemiological association patterns of serum thyrotropin by combining random forests and bayesian networks
url https://doi.org/10.1371/journal.pone.0271610
work_keys_str_mv AT annkristinbecker analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT tillittermann analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT markusdorr analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT stephanbfelix analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT matthiasnauck analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT alexanderteumer analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT uwevolker analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT henryvolzke analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT larskaderali analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks
AT neetikanath analysisofepidemiologicalassociationpatternsofserumthyrotropinbycombiningrandomforestsandbayesiannetworks