Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project

Large-scale medical data sets are vital for hands-on education in health data science but are often inaccessible due to privacy concerns. Addressing this gap, we developed the Health Gym project, a free and open-source platform designed to generate synthetic health data sets applicable to...

Full description

Bibliographic Details
Main Authors: Nicholas I-Hsien Kuo, Oscar Perez-Concha, Mark Hanly, Emmanuel Mnatzaganian, Brandon Hao, Marcus Di Sipio, Guolin Yu, Jash Vanjara, Ivy Cerelia Valerie, Juliana de Oliveira Costa, Timothy Churches, Sanja Lujic, Jo Hegarty, Louisa Jorm, Sebastiano Barbieri
Format: Article
Language:English
Published: JMIR Publications 2024-01-01
Series:JMIR Medical Education
Online Access:https://mededu.jmir.org/2024/1/e51388
_version_ 1797353882236485632
author Nicholas I-Hsien Kuo
Oscar Perez-Concha
Mark Hanly
Emmanuel Mnatzaganian
Brandon Hao
Marcus Di Sipio
Guolin Yu
Jash Vanjara
Ivy Cerelia Valerie
Juliana de Oliveira Costa
Timothy Churches
Sanja Lujic
Jo Hegarty
Louisa Jorm
Sebastiano Barbieri
author_facet Nicholas I-Hsien Kuo
Oscar Perez-Concha
Mark Hanly
Emmanuel Mnatzaganian
Brandon Hao
Marcus Di Sipio
Guolin Yu
Jash Vanjara
Ivy Cerelia Valerie
Juliana de Oliveira Costa
Timothy Churches
Sanja Lujic
Jo Hegarty
Louisa Jorm
Sebastiano Barbieri
author_sort Nicholas I-Hsien Kuo
collection DOAJ
description Large-scale medical data sets are vital for hands-on education in health data science but are often inaccessible due to privacy concerns. Addressing this gap, we developed the Health Gym project, a free and open-source platform designed to generate synthetic health data sets applicable to various areas of data science education, including machine learning, data visualization, and traditional statistical models. Initially, we generated 3 synthetic data sets for sepsis, acute hypotension, and antiretroviral therapy for HIV infection. This paper discusses the educational applications of Health Gym’s synthetic data sets. We illustrate this through their use in postgraduate health data science courses delivered by the University of New South Wales, Australia, and a Datathon event, involving academics, students, clinicians, and local health district professionals. We also include adaptable worked examples using our synthetic data sets, designed to enrich hands-on tutorial and workshop experiences. Although we highlight the potential of these data sets in advancing data science education and health care artificial intelligence, we also emphasize the need for continued research into the inherent limitations of synthetic data.
first_indexed 2024-03-08T13:37:24Z
format Article
id doaj.art-6be48b88d46a403192ae2284e7f1b73f
institution Directory Open Access Journal
issn 2369-3762
language English
last_indexed 2024-03-08T13:37:24Z
publishDate 2024-01-01
publisher JMIR Publications
record_format Article
series JMIR Medical Education
spelling doaj.art-6be48b88d46a403192ae2284e7f1b73f2024-01-16T15:00:35ZengJMIR PublicationsJMIR Medical Education2369-37622024-01-0110e5138810.2196/51388Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym ProjectNicholas I-Hsien Kuohttps://orcid.org/0000-0001-8749-7280Oscar Perez-Conchahttps://orcid.org/0000-0002-8823-7090Mark Hanlyhttps://orcid.org/0000-0002-9279-7453Emmanuel Mnatzaganianhttps://orcid.org/0009-0009-5091-2642Brandon Haohttps://orcid.org/0009-0009-6237-1783Marcus Di Sipiohttps://orcid.org/0009-0007-9271-755XGuolin Yuhttps://orcid.org/0009-0008-9382-1882Jash Vanjarahttps://orcid.org/0009-0003-3524-0696Ivy Cerelia Valeriehttps://orcid.org/0000-0001-6361-1587Juliana de Oliveira Costahttps://orcid.org/0000-0002-8355-023XTimothy Churcheshttps://orcid.org/0000-0002-7905-5877Sanja Lujichttps://orcid.org/0000-0002-9555-0261Jo Hegartyhttps://orcid.org/0009-0001-7445-2179Louisa Jormhttps://orcid.org/0000-0003-0390-661XSebastiano Barbierihttps://orcid.org/0000-0002-5919-372X Large-scale medical data sets are vital for hands-on education in health data science but are often inaccessible due to privacy concerns. Addressing this gap, we developed the Health Gym project, a free and open-source platform designed to generate synthetic health data sets applicable to various areas of data science education, including machine learning, data visualization, and traditional statistical models. Initially, we generated 3 synthetic data sets for sepsis, acute hypotension, and antiretroviral therapy for HIV infection. This paper discusses the educational applications of Health Gym’s synthetic data sets. We illustrate this through their use in postgraduate health data science courses delivered by the University of New South Wales, Australia, and a Datathon event, involving academics, students, clinicians, and local health district professionals. We also include adaptable worked examples using our synthetic data sets, designed to enrich hands-on tutorial and workshop experiences. Although we highlight the potential of these data sets in advancing data science education and health care artificial intelligence, we also emphasize the need for continued research into the inherent limitations of synthetic data.https://mededu.jmir.org/2024/1/e51388
spellingShingle Nicholas I-Hsien Kuo
Oscar Perez-Concha
Mark Hanly
Emmanuel Mnatzaganian
Brandon Hao
Marcus Di Sipio
Guolin Yu
Jash Vanjara
Ivy Cerelia Valerie
Juliana de Oliveira Costa
Timothy Churches
Sanja Lujic
Jo Hegarty
Louisa Jorm
Sebastiano Barbieri
Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project
JMIR Medical Education
title Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project
title_full Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project
title_fullStr Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project
title_full_unstemmed Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project
title_short Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project
title_sort enriching data science and health care education application and impact of synthetic data sets through the health gym project
url https://mededu.jmir.org/2024/1/e51388
work_keys_str_mv AT nicholasihsienkuo enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject
AT oscarperezconcha enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject
AT markhanly enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject
AT emmanuelmnatzaganian enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject
AT brandonhao enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject
AT marcusdisipio enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject
AT guolinyu enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject
AT jashvanjara enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject
AT ivycereliavalerie enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject
AT julianadeoliveiracosta enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject
AT timothychurches enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject
AT sanjalujic enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject
AT johegarty enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject
AT louisajorm enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject
AT sebastianobarbieri enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject