Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation

BackgroundNationwide population-based cohorts provide a new opportunity to build automated risk prediction models at the patient level, and claim data are one of the more useful resources to this end. To avoid unnecessary diagnostic intervention after cancer screening tests,...

Full description

Bibliographic Details
Main Authors:	Eunsaem Lee, Se Young Jung, Hyung Ju Hwang, Jaewoo Jung
Format:	Article
Language:	English
Published:	JMIR Publications 2021-08-01
Series:	JMIR Medical Informatics
Online Access:	https://medinform.jmir.org/2021/8/e29807

_version_	1797735752591736832
author	Eunsaem Lee Se Young Jung Hyung Ju Hwang Jaewoo Jung
author_facet	Eunsaem Lee Se Young Jung Hyung Ju Hwang Jaewoo Jung
author_sort	Eunsaem Lee
collection	DOAJ
description	BackgroundNationwide population-based cohorts provide a new opportunity to build automated risk prediction models at the patient level, and claim data are one of the more useful resources to this end. To avoid unnecessary diagnostic intervention after cancer screening tests, patient-level prediction models should be developed. ObjectiveWe aimed to develop cancer prediction models using nationwide claim databases with machine learning algorithms, which are explainable and easily applicable in real-world environments. MethodsAs source data, we used the Korean National Insurance System Database. Every Korean in ≥40 years old undergoes a national health checkup every 2 years. We gathered all variables from the database including demographic information, basic laboratory values, anthropometric values, and previous medical history. We applied conventional logistic regression methods, light gradient boosting methods, neural networks, survival analysis, and one-class embedding classifier methods to effectively analyze high dimension data based on deep learning–based anomaly detection. Performance was measured with area under the curve and area under precision recall curve. We validated our models externally with a health checkup database from a tertiary hospital. ResultsThe one-class embedding classifier model received the highest area under the curve scores with values of 0.868, 0.849, 0.798, 0.746, 0.800, 0.749, and 0.790 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. For area under precision recall curve, the light gradient boosting models had the highest score with values of 0.383, 0.401, 0.387, 0.300, 0.385, 0.357, and 0.296 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. ConclusionsOur results show that it is possible to easily develop applicable cancer prediction models with nationwide claim data using machine learning. The 7 models showed acceptable performances and explainability, and thus can be distributed easily in real-world environments.
first_indexed	2024-03-12T13:03:39Z
format	Article
id	doaj.art-95a51cbcf77e49ebb925a508ffe64802
institution	Directory Open Access Journal
issn	2291-9694
language	English
last_indexed	2024-03-12T13:03:39Z
publishDate	2021-08-01
publisher	JMIR Publications
record_format	Article
series	JMIR Medical Informatics
spelling	doaj.art-95a51cbcf77e49ebb925a508ffe648022023-08-28T18:42:17ZengJMIR PublicationsJMIR Medical Informatics2291-96942021-08-0198e2980710.2196/29807Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and ValidationEunsaem Leehttps://orcid.org/0000-0001-9606-3230Se Young Junghttps://orcid.org/0000-0001-9946-8807Hyung Ju Hwanghttps://orcid.org/0000-0002-3678-2687Jaewoo Junghttps://orcid.org/0000-0002-6340-3275 BackgroundNationwide population-based cohorts provide a new opportunity to build automated risk prediction models at the patient level, and claim data are one of the more useful resources to this end. To avoid unnecessary diagnostic intervention after cancer screening tests, patient-level prediction models should be developed. ObjectiveWe aimed to develop cancer prediction models using nationwide claim databases with machine learning algorithms, which are explainable and easily applicable in real-world environments. MethodsAs source data, we used the Korean National Insurance System Database. Every Korean in ≥40 years old undergoes a national health checkup every 2 years. We gathered all variables from the database including demographic information, basic laboratory values, anthropometric values, and previous medical history. We applied conventional logistic regression methods, light gradient boosting methods, neural networks, survival analysis, and one-class embedding classifier methods to effectively analyze high dimension data based on deep learning–based anomaly detection. Performance was measured with area under the curve and area under precision recall curve. We validated our models externally with a health checkup database from a tertiary hospital. ResultsThe one-class embedding classifier model received the highest area under the curve scores with values of 0.868, 0.849, 0.798, 0.746, 0.800, 0.749, and 0.790 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. For area under precision recall curve, the light gradient boosting models had the highest score with values of 0.383, 0.401, 0.387, 0.300, 0.385, 0.357, and 0.296 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. ConclusionsOur results show that it is possible to easily develop applicable cancer prediction models with nationwide claim data using machine learning. The 7 models showed acceptable performances and explainability, and thus can be distributed easily in real-world environments.https://medinform.jmir.org/2021/8/e29807
spellingShingle	Eunsaem Lee Se Young Jung Hyung Ju Hwang Jaewoo Jung Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation JMIR Medical Informatics
title	Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation
title_full	Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation
title_fullStr	Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation
title_full_unstemmed	Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation
title_short	Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation
title_sort	patient level cancer prediction models from a nationwide patient cohort model development and validation
url	https://medinform.jmir.org/2021/8/e29807
work_keys_str_mv	AT eunsaemlee patientlevelcancerpredictionmodelsfromanationwidepatientcohortmodeldevelopmentandvalidation AT seyoungjung patientlevelcancerpredictionmodelsfromanationwidepatientcohortmodeldevelopmentandvalidation AT hyungjuhwang patientlevelcancerpredictionmodelsfromanationwidepatientcohortmodeldevelopmentandvalidation AT jaewoojung patientlevelcancerpredictionmodelsfromanationwidepatientcohortmodeldevelopmentandvalidation

Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation

Similar Items