Supersparse linear integer models for optimized medical scoring systems
Scoring systems are linear classification models that only require users to add, subtract and multiply a few small numbers in order to make a prediction. These models are in widespread use by the medical community, but are difficult to learn from data because they need to be accurate and sparse, hav...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
Springer US
2016
|
Online Access: | http://hdl.handle.net/1721.1/103141 https://orcid.org/0000-0001-5188-3155 |
_version_ | 1826191644293070848 |
---|---|
author | Ustun, Berk Rudin, Cynthia |
author2 | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
author_facet | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Ustun, Berk Rudin, Cynthia |
author_sort | Ustun, Berk |
collection | MIT |
description | Scoring systems are linear classification models that only require users to add, subtract and multiply a few small numbers in order to make a prediction. These models are in widespread use by the medical community, but are difficult to learn from data because they need to be accurate and sparse, have coprime integer coefficients, and satisfy multiple operational constraints. We present a new method for creating data-driven scoring systems called a Supersparse Linear Integer Model (SLIM). SLIM scoring systems are built by using an integer programming problem that directly encodes measures of accuracy (the 0–1 loss) and sparsity (the ℓ[subscript 0]-seminorm) while restricting coefficients to coprime integers. SLIM can seamlessly incorporate a wide range of operational constraints related to accuracy and sparsity, and can produce acceptable models without parameter tuning because of the direct control provided over these quantities. We provide bounds on the testing and training accuracy of SLIM scoring systems, and present a new data reduction technique that can improve scalability by eliminating a portion of the training data beforehand. Our paper includes results from a collaboration with the Massachusetts General Hospital Sleep Laboratory, where SLIM is being used to create a highly tailored scoring system for sleep apnea screening. |
first_indexed | 2024-09-23T08:59:08Z |
format | Article |
id | mit-1721.1/103141 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T08:59:08Z |
publishDate | 2016 |
publisher | Springer US |
record_format | dspace |
spelling | mit-1721.1/1031412022-09-30T12:40:43Z Supersparse linear integer models for optimized medical scoring systems Ustun, Berk Rudin, Cynthia Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Sloan School of Management Ustun, Berk Rudin, Cynthia Scoring systems are linear classification models that only require users to add, subtract and multiply a few small numbers in order to make a prediction. These models are in widespread use by the medical community, but are difficult to learn from data because they need to be accurate and sparse, have coprime integer coefficients, and satisfy multiple operational constraints. We present a new method for creating data-driven scoring systems called a Supersparse Linear Integer Model (SLIM). SLIM scoring systems are built by using an integer programming problem that directly encodes measures of accuracy (the 0–1 loss) and sparsity (the ℓ[subscript 0]-seminorm) while restricting coefficients to coprime integers. SLIM can seamlessly incorporate a wide range of operational constraints related to accuracy and sparsity, and can produce acceptable models without parameter tuning because of the direct control provided over these quantities. We provide bounds on the testing and training accuracy of SLIM scoring systems, and present a new data reduction technique that can improve scalability by eliminating a portion of the training data beforehand. Our paper includes results from a collaboration with the Massachusetts General Hospital Sleep Laboratory, where SLIM is being used to create a highly tailored scoring system for sleep apnea screening. Siemens Aktiengesellschaft Wistron Corporation 2016-06-17T17:10:35Z 2017-03-01T16:14:47Z 2015-11 2015-02 2016-05-23T12:15:08Z Article http://purl.org/eprint/type/JournalArticle 0885-6125 1573-0565 http://hdl.handle.net/1721.1/103141 Ustun, Berk, and Cynthia Rudin. “Supersparse Linear Integer Models for Optimized Medical Scoring Systems.” Machine Learning 102.3 (2016): 349–391. https://orcid.org/0000-0001-5188-3155 en http://dx.doi.org/10.1007/s10994-015-5528-6 Machine Learning Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ The Author(s) application/pdf Springer US Springer US |
spellingShingle | Ustun, Berk Rudin, Cynthia Supersparse linear integer models for optimized medical scoring systems |
title | Supersparse linear integer models for optimized medical scoring systems |
title_full | Supersparse linear integer models for optimized medical scoring systems |
title_fullStr | Supersparse linear integer models for optimized medical scoring systems |
title_full_unstemmed | Supersparse linear integer models for optimized medical scoring systems |
title_short | Supersparse linear integer models for optimized medical scoring systems |
title_sort | supersparse linear integer models for optimized medical scoring systems |
url | http://hdl.handle.net/1721.1/103141 https://orcid.org/0000-0001-5188-3155 |
work_keys_str_mv | AT ustunberk supersparselinearintegermodelsforoptimizedmedicalscoringsystems AT rudincynthia supersparselinearintegermodelsforoptimizedmedicalscoringsystems |