Supersparse linear integer models for optimized medical scoring systems

Scoring systems are linear classification models that only require users to add, subtract and multiply a few small numbers in order to make a prediction. These models are in widespread use by the medical community, but are difficult to learn from data because they need to be accurate and sparse, hav...

Full description

Bibliographic Details
Main Authors: Ustun, Berk, Rudin, Cynthia
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: Springer US 2016
Online Access:http://hdl.handle.net/1721.1/103141
https://orcid.org/0000-0001-5188-3155
_version_ 1826191644293070848
author Ustun, Berk
Rudin, Cynthia
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Ustun, Berk
Rudin, Cynthia
author_sort Ustun, Berk
collection MIT
description Scoring systems are linear classification models that only require users to add, subtract and multiply a few small numbers in order to make a prediction. These models are in widespread use by the medical community, but are difficult to learn from data because they need to be accurate and sparse, have coprime integer coefficients, and satisfy multiple operational constraints. We present a new method for creating data-driven scoring systems called a Supersparse Linear Integer Model (SLIM). SLIM scoring systems are built by using an integer programming problem that directly encodes measures of accuracy (the 0–1 loss) and sparsity (the ℓ[subscript 0]-seminorm) while restricting coefficients to coprime integers. SLIM can seamlessly incorporate a wide range of operational constraints related to accuracy and sparsity, and can produce acceptable models without parameter tuning because of the direct control provided over these quantities. We provide bounds on the testing and training accuracy of SLIM scoring systems, and present a new data reduction technique that can improve scalability by eliminating a portion of the training data beforehand. Our paper includes results from a collaboration with the Massachusetts General Hospital Sleep Laboratory, where SLIM is being used to create a highly tailored scoring system for sleep apnea screening.
first_indexed 2024-09-23T08:59:08Z
format Article
id mit-1721.1/103141
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T08:59:08Z
publishDate 2016
publisher Springer US
record_format dspace
spelling mit-1721.1/1031412022-09-30T12:40:43Z Supersparse linear integer models for optimized medical scoring systems Ustun, Berk Rudin, Cynthia Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Sloan School of Management Ustun, Berk Rudin, Cynthia Scoring systems are linear classification models that only require users to add, subtract and multiply a few small numbers in order to make a prediction. These models are in widespread use by the medical community, but are difficult to learn from data because they need to be accurate and sparse, have coprime integer coefficients, and satisfy multiple operational constraints. We present a new method for creating data-driven scoring systems called a Supersparse Linear Integer Model (SLIM). SLIM scoring systems are built by using an integer programming problem that directly encodes measures of accuracy (the 0–1 loss) and sparsity (the ℓ[subscript 0]-seminorm) while restricting coefficients to coprime integers. SLIM can seamlessly incorporate a wide range of operational constraints related to accuracy and sparsity, and can produce acceptable models without parameter tuning because of the direct control provided over these quantities. We provide bounds on the testing and training accuracy of SLIM scoring systems, and present a new data reduction technique that can improve scalability by eliminating a portion of the training data beforehand. Our paper includes results from a collaboration with the Massachusetts General Hospital Sleep Laboratory, where SLIM is being used to create a highly tailored scoring system for sleep apnea screening. Siemens Aktiengesellschaft Wistron Corporation 2016-06-17T17:10:35Z 2017-03-01T16:14:47Z 2015-11 2015-02 2016-05-23T12:15:08Z Article http://purl.org/eprint/type/JournalArticle 0885-6125 1573-0565 http://hdl.handle.net/1721.1/103141 Ustun, Berk, and Cynthia Rudin. “Supersparse Linear Integer Models for Optimized Medical Scoring Systems.” Machine Learning 102.3 (2016): 349–391. https://orcid.org/0000-0001-5188-3155 en http://dx.doi.org/10.1007/s10994-015-5528-6 Machine Learning Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ The Author(s) application/pdf Springer US Springer US
spellingShingle Ustun, Berk
Rudin, Cynthia
Supersparse linear integer models for optimized medical scoring systems
title Supersparse linear integer models for optimized medical scoring systems
title_full Supersparse linear integer models for optimized medical scoring systems
title_fullStr Supersparse linear integer models for optimized medical scoring systems
title_full_unstemmed Supersparse linear integer models for optimized medical scoring systems
title_short Supersparse linear integer models for optimized medical scoring systems
title_sort supersparse linear integer models for optimized medical scoring systems
url http://hdl.handle.net/1721.1/103141
https://orcid.org/0000-0001-5188-3155
work_keys_str_mv AT ustunberk supersparselinearintegermodelsforoptimizedmedicalscoringsystems
AT rudincynthia supersparselinearintegermodelsforoptimizedmedicalscoringsystems