On the analysis of glycomics mass spectrometry data via the regularized area under the ROC curve

<p>Abstract</p> <p>Background</p> <p>Novel molecular and statistical methods are in rising demand for disease diagnosis and prognosis with the help of recent advanced biotechnology. High-resolution mass spectrometry (MS) is one of those biotechnologies that are highly p...

Full description

Bibliographic Details
Main Authors: Lebrilla Carlito B, Kirmiz Crystal, Liu Hao, Ye Jingjing, Rocke David M
Format: Article
Language:English
Published: BMC 2007-12-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/8/477
_version_ 1828552827879817216
author Lebrilla Carlito B
Kirmiz Crystal
Liu Hao
Ye Jingjing
Rocke David M
author_facet Lebrilla Carlito B
Kirmiz Crystal
Liu Hao
Ye Jingjing
Rocke David M
author_sort Lebrilla Carlito B
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Novel molecular and statistical methods are in rising demand for disease diagnosis and prognosis with the help of recent advanced biotechnology. High-resolution mass spectrometry (MS) is one of those biotechnologies that are highly promising to improve health outcome. Previous literatures have identified some proteomics biomarkers that can distinguish healthy patients from cancer patients using MS data. In this paper, an MS study is demonstrated which uses glycomics to identify ovarian cancer. Glycomics is the study of glycans and glycoproteins. The glycans on the proteins may deviate between a cancer cell and a normal cell and may be visible in the blood. High-resolution MS has been applied to measure relative abundances of potential glycan biomarkers in human serum. Multiple potential glycan biomarkers are measured in MS spectra. With the objection of maximizing the empirical area under the ROC curve (AUC), an analysis method was considered which combines potential glycan biomarkers for the diagnosis of cancer.</p> <p>Results</p> <p>Maximizing the empirical AUC of glycomics MS data is a large-dimensional optimization problem. The technical difficulty is that the empirical AUC function is not continuous. Instead, it is in fact an empirical 0–1 loss function with a large number of linear predictors. An approach was investigated that regularizes the area under the ROC curve while replacing the 0–1 loss function with a smooth surrogate function. The constrained threshold gradient descent regularization algorithm was applied, where the regularization parameters were chosen by the cross-validation method, and the confidence intervals of the regression parameters were estimated by the bootstrap method. The method is called TGDR-AUC algorithm. The properties of the approach were studied through a numerical simulation study, which incorporates the positive values of mass spectrometry data with the correlations between measurements within person. The simulation proved asymptotic properties that estimated AUC approaches the true AUC. Finally, mass spectrometry data of serum glycan for ovarian cancer diagnosis was analyzed. The optimal combination based on TGDR-AUC algorithm yields plausible result and the detected biomarkers are confirmed based on biological evidence.</p> <p>Conclusion</p> <p>The TGDR-AUC algorithm relaxes the normality and independence assumptions from previous literatures. In addition to its flexibility and easy interpretability, the algorithm yields good performance in combining potential biomarkers and is computationally feasible. Thus, the approach of TGDR-AUC is a plausible algorithm to classify disease status on the basis of multiple biomarkers.</p>
first_indexed 2024-12-12T05:07:11Z
format Article
id doaj.art-d017ef46ca104c02a88185c67693a8fe
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-12T05:07:11Z
publishDate 2007-12-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-d017ef46ca104c02a88185c67693a8fe2022-12-22T00:37:04ZengBMCBMC Bioinformatics1471-21052007-12-018147710.1186/1471-2105-8-477On the analysis of glycomics mass spectrometry data via the regularized area under the ROC curveLebrilla Carlito BKirmiz CrystalLiu HaoYe JingjingRocke David M<p>Abstract</p> <p>Background</p> <p>Novel molecular and statistical methods are in rising demand for disease diagnosis and prognosis with the help of recent advanced biotechnology. High-resolution mass spectrometry (MS) is one of those biotechnologies that are highly promising to improve health outcome. Previous literatures have identified some proteomics biomarkers that can distinguish healthy patients from cancer patients using MS data. In this paper, an MS study is demonstrated which uses glycomics to identify ovarian cancer. Glycomics is the study of glycans and glycoproteins. The glycans on the proteins may deviate between a cancer cell and a normal cell and may be visible in the blood. High-resolution MS has been applied to measure relative abundances of potential glycan biomarkers in human serum. Multiple potential glycan biomarkers are measured in MS spectra. With the objection of maximizing the empirical area under the ROC curve (AUC), an analysis method was considered which combines potential glycan biomarkers for the diagnosis of cancer.</p> <p>Results</p> <p>Maximizing the empirical AUC of glycomics MS data is a large-dimensional optimization problem. The technical difficulty is that the empirical AUC function is not continuous. Instead, it is in fact an empirical 0–1 loss function with a large number of linear predictors. An approach was investigated that regularizes the area under the ROC curve while replacing the 0–1 loss function with a smooth surrogate function. The constrained threshold gradient descent regularization algorithm was applied, where the regularization parameters were chosen by the cross-validation method, and the confidence intervals of the regression parameters were estimated by the bootstrap method. The method is called TGDR-AUC algorithm. The properties of the approach were studied through a numerical simulation study, which incorporates the positive values of mass spectrometry data with the correlations between measurements within person. The simulation proved asymptotic properties that estimated AUC approaches the true AUC. Finally, mass spectrometry data of serum glycan for ovarian cancer diagnosis was analyzed. The optimal combination based on TGDR-AUC algorithm yields plausible result and the detected biomarkers are confirmed based on biological evidence.</p> <p>Conclusion</p> <p>The TGDR-AUC algorithm relaxes the normality and independence assumptions from previous literatures. In addition to its flexibility and easy interpretability, the algorithm yields good performance in combining potential biomarkers and is computationally feasible. Thus, the approach of TGDR-AUC is a plausible algorithm to classify disease status on the basis of multiple biomarkers.</p>http://www.biomedcentral.com/1471-2105/8/477
spellingShingle Lebrilla Carlito B
Kirmiz Crystal
Liu Hao
Ye Jingjing
Rocke David M
On the analysis of glycomics mass spectrometry data via the regularized area under the ROC curve
BMC Bioinformatics
title On the analysis of glycomics mass spectrometry data via the regularized area under the ROC curve
title_full On the analysis of glycomics mass spectrometry data via the regularized area under the ROC curve
title_fullStr On the analysis of glycomics mass spectrometry data via the regularized area under the ROC curve
title_full_unstemmed On the analysis of glycomics mass spectrometry data via the regularized area under the ROC curve
title_short On the analysis of glycomics mass spectrometry data via the regularized area under the ROC curve
title_sort on the analysis of glycomics mass spectrometry data via the regularized area under the roc curve
url http://www.biomedcentral.com/1471-2105/8/477
work_keys_str_mv AT lebrillacarlitob ontheanalysisofglycomicsmassspectrometrydataviatheregularizedareaundertheroccurve
AT kirmizcrystal ontheanalysisofglycomicsmassspectrometrydataviatheregularizedareaundertheroccurve
AT liuhao ontheanalysisofglycomicsmassspectrometrydataviatheregularizedareaundertheroccurve
AT yejingjing ontheanalysisofglycomicsmassspectrometrydataviatheregularizedareaundertheroccurve
AT rockedavidm ontheanalysisofglycomicsmassspectrometrydataviatheregularizedareaundertheroccurve