A compressed large language model embedding dataset of ICD 10 CM descriptions

Abstract This paper presents novel datasets providing numerical representations of ICD-10-CM codes by generating description embeddings using a large language model followed by a dimension reduction via autoencoder. The embeddings serve as informative input features for machine learning models by ca...

Full description

Bibliographic Details
Main Authors: Michael J. Kane, Casey King, Denise Esserman, Nancy K. Latham, Erich J. Greene, David A. Ganz
Format: Article
Language:English
Published: BMC 2023-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-023-05597-2