Proteomics biomarker discovery for individualized prevention of familial pancreatic cancer using statistical learning.

<h4>Background</h4>The low five-year survival rate of pancreatic ductal adenocarcinoma (PDAC) and the low diagnostic rate of early-stage PDAC via imaging highlight the need to discover novel biomarkers and improve the current screening procedures for early diagnosis. Familial pancreatic...

Full description

Bibliographic Details
Main Authors: Chung Shing Rex Ha, Martina Müller-Nurasyid, Agnese Petrera, Stefanie M Hauck, Federico Marini, Detlef K Bartsch, Emily P Slater, Konstantin Strauch
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2023-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0280399
_version_ 1797937985004503040
author Chung Shing Rex Ha
Martina Müller-Nurasyid
Agnese Petrera
Stefanie M Hauck
Federico Marini
Detlef K Bartsch
Emily P Slater
Konstantin Strauch
author_facet Chung Shing Rex Ha
Martina Müller-Nurasyid
Agnese Petrera
Stefanie M Hauck
Federico Marini
Detlef K Bartsch
Emily P Slater
Konstantin Strauch
author_sort Chung Shing Rex Ha
collection DOAJ
description <h4>Background</h4>The low five-year survival rate of pancreatic ductal adenocarcinoma (PDAC) and the low diagnostic rate of early-stage PDAC via imaging highlight the need to discover novel biomarkers and improve the current screening procedures for early diagnosis. Familial pancreatic cancer (FPC) describes the cases of PDAC that are present in two or more individuals within a circle of first-degree relatives. Using innovative high-throughput proteomics, we were able to quantify the protein profiles of individuals at risk from FPC families in different potential pre-cancer stages. However, the high-dimensional proteomics data structure challenges the use of traditional statistical analysis tools. Hence, we applied advanced statistical learning methods to enhance the analysis and improve the results' interpretability.<h4>Methods</h4>We applied model-based gradient boosting and adaptive lasso to deal with the small, unbalanced study design via simultaneous variable selection and model fitting. In addition, we used stability selection to identify a stable subset of selected biomarkers and, as a result, obtain even more interpretable results. In each step, we compared the performance of the different analytical pipelines and validated our approaches via simulation scenarios.<h4>Results</h4>In the simulation study, model-based gradient boosting showed a more accurate prediction performance in the small, unbalanced, and high-dimensional datasets than adaptive lasso and could identify more relevant variables. Furthermore, using model-based gradient boosting, we discovered a subset of promising serum biomarkers that may potentially improve the current screening procedure of FPC.<h4>Conclusion</h4>Advanced statistical learning methods helped us overcome the shortcomings of an unbalanced study design in a valuable clinical dataset. The discovered serum biomarkers provide us with a clear direction for further investigations and more precise clinical hypotheses regarding the development of FPC and optimal strategies for its early detection.
first_indexed 2024-04-10T18:52:42Z
format Article
id doaj.art-96931f55022e484289322de847023aee
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-10T18:52:42Z
publishDate 2023-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-96931f55022e484289322de847023aee2023-02-01T05:31:33ZengPublic Library of Science (PLoS)PLoS ONE1932-62032023-01-01181e028039910.1371/journal.pone.0280399Proteomics biomarker discovery for individualized prevention of familial pancreatic cancer using statistical learning.Chung Shing Rex HaMartina Müller-NurasyidAgnese PetreraStefanie M HauckFederico MariniDetlef K BartschEmily P SlaterKonstantin Strauch<h4>Background</h4>The low five-year survival rate of pancreatic ductal adenocarcinoma (PDAC) and the low diagnostic rate of early-stage PDAC via imaging highlight the need to discover novel biomarkers and improve the current screening procedures for early diagnosis. Familial pancreatic cancer (FPC) describes the cases of PDAC that are present in two or more individuals within a circle of first-degree relatives. Using innovative high-throughput proteomics, we were able to quantify the protein profiles of individuals at risk from FPC families in different potential pre-cancer stages. However, the high-dimensional proteomics data structure challenges the use of traditional statistical analysis tools. Hence, we applied advanced statistical learning methods to enhance the analysis and improve the results' interpretability.<h4>Methods</h4>We applied model-based gradient boosting and adaptive lasso to deal with the small, unbalanced study design via simultaneous variable selection and model fitting. In addition, we used stability selection to identify a stable subset of selected biomarkers and, as a result, obtain even more interpretable results. In each step, we compared the performance of the different analytical pipelines and validated our approaches via simulation scenarios.<h4>Results</h4>In the simulation study, model-based gradient boosting showed a more accurate prediction performance in the small, unbalanced, and high-dimensional datasets than adaptive lasso and could identify more relevant variables. Furthermore, using model-based gradient boosting, we discovered a subset of promising serum biomarkers that may potentially improve the current screening procedure of FPC.<h4>Conclusion</h4>Advanced statistical learning methods helped us overcome the shortcomings of an unbalanced study design in a valuable clinical dataset. The discovered serum biomarkers provide us with a clear direction for further investigations and more precise clinical hypotheses regarding the development of FPC and optimal strategies for its early detection.https://doi.org/10.1371/journal.pone.0280399
spellingShingle Chung Shing Rex Ha
Martina Müller-Nurasyid
Agnese Petrera
Stefanie M Hauck
Federico Marini
Detlef K Bartsch
Emily P Slater
Konstantin Strauch
Proteomics biomarker discovery for individualized prevention of familial pancreatic cancer using statistical learning.
PLoS ONE
title Proteomics biomarker discovery for individualized prevention of familial pancreatic cancer using statistical learning.
title_full Proteomics biomarker discovery for individualized prevention of familial pancreatic cancer using statistical learning.
title_fullStr Proteomics biomarker discovery for individualized prevention of familial pancreatic cancer using statistical learning.
title_full_unstemmed Proteomics biomarker discovery for individualized prevention of familial pancreatic cancer using statistical learning.
title_short Proteomics biomarker discovery for individualized prevention of familial pancreatic cancer using statistical learning.
title_sort proteomics biomarker discovery for individualized prevention of familial pancreatic cancer using statistical learning
url https://doi.org/10.1371/journal.pone.0280399
work_keys_str_mv AT chungshingrexha proteomicsbiomarkerdiscoveryforindividualizedpreventionoffamilialpancreaticcancerusingstatisticallearning
AT martinamullernurasyid proteomicsbiomarkerdiscoveryforindividualizedpreventionoffamilialpancreaticcancerusingstatisticallearning
AT agnesepetrera proteomicsbiomarkerdiscoveryforindividualizedpreventionoffamilialpancreaticcancerusingstatisticallearning
AT stefaniemhauck proteomicsbiomarkerdiscoveryforindividualizedpreventionoffamilialpancreaticcancerusingstatisticallearning
AT federicomarini proteomicsbiomarkerdiscoveryforindividualizedpreventionoffamilialpancreaticcancerusingstatisticallearning
AT detlefkbartsch proteomicsbiomarkerdiscoveryforindividualizedpreventionoffamilialpancreaticcancerusingstatisticallearning
AT emilypslater proteomicsbiomarkerdiscoveryforindividualizedpreventionoffamilialpancreaticcancerusingstatisticallearning
AT konstantinstrauch proteomicsbiomarkerdiscoveryforindividualizedpreventionoffamilialpancreaticcancerusingstatisticallearning