cutpointr: Improved Estimation and Validation of Optimal Cutpoints in R

"Optimal cutpoints" for binary classification tasks are often established by testing which cutpoint yields the best discrimination, for example the Youden index, in a specific sample. This results in "optimal" cutpoints that are highly variable and systematically overestimate the...

Full description

Bibliographic Details
Main Authors: Christian Thiele, Gerrit Hirschfeld
Format: Article
Language:English
Published: Foundation for Open Access Statistics 2021-06-01
Series:Journal of Statistical Software
Subjects:
Online Access:https://www.jstatsoft.org/index.php/jss/article/view/3429
_version_ 1827935812169236480
author Christian Thiele
Gerrit Hirschfeld
author_facet Christian Thiele
Gerrit Hirschfeld
author_sort Christian Thiele
collection DOAJ
description "Optimal cutpoints" for binary classification tasks are often established by testing which cutpoint yields the best discrimination, for example the Youden index, in a specific sample. This results in "optimal" cutpoints that are highly variable and systematically overestimate the out-of-sample performance. To address these concerns, the cutpointr package offers robust methods for estimating optimal cutpoints and the out-of-sample performance. The robust methods include bootstrapping and smoothing based on kernel estimation, generalized additive models, smoothing splines, and local regression. These methods can be applied to a wide range of binary-classification and cost-based metrics. cutpointr also provides mechanisms to utilize user-defined metrics and estimation methods. The package has capabilities for parallelization of the bootstrapping, including reproducible random number generation. Furthermore, it is pipe-friendly, for example for compatibility with functions from tidyverse. Various functions for plotting receiver operating characteristic curves, precision recall graphs, bootstrap results and other representations of the data are included. The package contains example data from a study on psychological characteristics and suicide attempts suitable for applying binary classification algorithms.
first_indexed 2024-03-13T08:00:14Z
format Article
id doaj.art-854db438548c4d61a89dbb967a1c05f9
institution Directory Open Access Journal
issn 1548-7660
language English
last_indexed 2024-03-13T08:00:14Z
publishDate 2021-06-01
publisher Foundation for Open Access Statistics
record_format Article
series Journal of Statistical Software
spelling doaj.art-854db438548c4d61a89dbb967a1c05f92023-06-01T18:41:07ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602021-06-0198110.18637/jss.v098.i113285cutpointr: Improved Estimation and Validation of Optimal Cutpoints in RChristian ThieleGerrit Hirschfeld"Optimal cutpoints" for binary classification tasks are often established by testing which cutpoint yields the best discrimination, for example the Youden index, in a specific sample. This results in "optimal" cutpoints that are highly variable and systematically overestimate the out-of-sample performance. To address these concerns, the cutpointr package offers robust methods for estimating optimal cutpoints and the out-of-sample performance. The robust methods include bootstrapping and smoothing based on kernel estimation, generalized additive models, smoothing splines, and local regression. These methods can be applied to a wide range of binary-classification and cost-based metrics. cutpointr also provides mechanisms to utilize user-defined metrics and estimation methods. The package has capabilities for parallelization of the bootstrapping, including reproducible random number generation. Furthermore, it is pipe-friendly, for example for compatibility with functions from tidyverse. Various functions for plotting receiver operating characteristic curves, precision recall graphs, bootstrap results and other representations of the data are included. The package contains example data from a study on psychological characteristics and suicide attempts suitable for applying binary classification algorithms.https://www.jstatsoft.org/index.php/jss/article/view/3429optimal cutpointROC curvebootstrapR
spellingShingle Christian Thiele
Gerrit Hirschfeld
cutpointr: Improved Estimation and Validation of Optimal Cutpoints in R
Journal of Statistical Software
optimal cutpoint
ROC curve
bootstrap
R
title cutpointr: Improved Estimation and Validation of Optimal Cutpoints in R
title_full cutpointr: Improved Estimation and Validation of Optimal Cutpoints in R
title_fullStr cutpointr: Improved Estimation and Validation of Optimal Cutpoints in R
title_full_unstemmed cutpointr: Improved Estimation and Validation of Optimal Cutpoints in R
title_short cutpointr: Improved Estimation and Validation of Optimal Cutpoints in R
title_sort cutpointr improved estimation and validation of optimal cutpoints in r
topic optimal cutpoint
ROC curve
bootstrap
R
url https://www.jstatsoft.org/index.php/jss/article/view/3429
work_keys_str_mv AT christianthiele cutpointrimprovedestimationandvalidationofoptimalcutpointsinr
AT gerrithirschfeld cutpointrimprovedestimationandvalidationofoptimalcutpointsinr