Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project
Large-scale medical data sets are vital for hands-on education in health data science but are often inaccessible due to privacy concerns. Addressing this gap, we developed the Health Gym project, a free and open-source platform designed to generate synthetic health data sets applicable to...
Main Authors: | , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
JMIR Publications
2024-01-01
|
Series: | JMIR Medical Education |
Online Access: | https://mededu.jmir.org/2024/1/e51388 |
_version_ | 1797353882236485632 |
---|---|
author | Nicholas I-Hsien Kuo Oscar Perez-Concha Mark Hanly Emmanuel Mnatzaganian Brandon Hao Marcus Di Sipio Guolin Yu Jash Vanjara Ivy Cerelia Valerie Juliana de Oliveira Costa Timothy Churches Sanja Lujic Jo Hegarty Louisa Jorm Sebastiano Barbieri |
author_facet | Nicholas I-Hsien Kuo Oscar Perez-Concha Mark Hanly Emmanuel Mnatzaganian Brandon Hao Marcus Di Sipio Guolin Yu Jash Vanjara Ivy Cerelia Valerie Juliana de Oliveira Costa Timothy Churches Sanja Lujic Jo Hegarty Louisa Jorm Sebastiano Barbieri |
author_sort | Nicholas I-Hsien Kuo |
collection | DOAJ |
description |
Large-scale medical data sets are vital for hands-on education in health data science but are often inaccessible due to privacy concerns. Addressing this gap, we developed the Health Gym project, a free and open-source platform designed to generate synthetic health data sets applicable to various areas of data science education, including machine learning, data visualization, and traditional statistical models. Initially, we generated 3 synthetic data sets for sepsis, acute hypotension, and antiretroviral therapy for HIV infection. This paper discusses the educational applications of Health Gym’s synthetic data sets. We illustrate this through their use in postgraduate health data science courses delivered by the University of New South Wales, Australia, and a Datathon event, involving academics, students, clinicians, and local health district professionals. We also include adaptable worked examples using our synthetic data sets, designed to enrich hands-on tutorial and workshop experiences. Although we highlight the potential of these data sets in advancing data science education and health care artificial intelligence, we also emphasize the need for continued research into the inherent limitations of synthetic data. |
first_indexed | 2024-03-08T13:37:24Z |
format | Article |
id | doaj.art-6be48b88d46a403192ae2284e7f1b73f |
institution | Directory Open Access Journal |
issn | 2369-3762 |
language | English |
last_indexed | 2024-03-08T13:37:24Z |
publishDate | 2024-01-01 |
publisher | JMIR Publications |
record_format | Article |
series | JMIR Medical Education |
spelling | doaj.art-6be48b88d46a403192ae2284e7f1b73f2024-01-16T15:00:35ZengJMIR PublicationsJMIR Medical Education2369-37622024-01-0110e5138810.2196/51388Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym ProjectNicholas I-Hsien Kuohttps://orcid.org/0000-0001-8749-7280Oscar Perez-Conchahttps://orcid.org/0000-0002-8823-7090Mark Hanlyhttps://orcid.org/0000-0002-9279-7453Emmanuel Mnatzaganianhttps://orcid.org/0009-0009-5091-2642Brandon Haohttps://orcid.org/0009-0009-6237-1783Marcus Di Sipiohttps://orcid.org/0009-0007-9271-755XGuolin Yuhttps://orcid.org/0009-0008-9382-1882Jash Vanjarahttps://orcid.org/0009-0003-3524-0696Ivy Cerelia Valeriehttps://orcid.org/0000-0001-6361-1587Juliana de Oliveira Costahttps://orcid.org/0000-0002-8355-023XTimothy Churcheshttps://orcid.org/0000-0002-7905-5877Sanja Lujichttps://orcid.org/0000-0002-9555-0261Jo Hegartyhttps://orcid.org/0009-0001-7445-2179Louisa Jormhttps://orcid.org/0000-0003-0390-661XSebastiano Barbierihttps://orcid.org/0000-0002-5919-372X Large-scale medical data sets are vital for hands-on education in health data science but are often inaccessible due to privacy concerns. Addressing this gap, we developed the Health Gym project, a free and open-source platform designed to generate synthetic health data sets applicable to various areas of data science education, including machine learning, data visualization, and traditional statistical models. Initially, we generated 3 synthetic data sets for sepsis, acute hypotension, and antiretroviral therapy for HIV infection. This paper discusses the educational applications of Health Gym’s synthetic data sets. We illustrate this through their use in postgraduate health data science courses delivered by the University of New South Wales, Australia, and a Datathon event, involving academics, students, clinicians, and local health district professionals. We also include adaptable worked examples using our synthetic data sets, designed to enrich hands-on tutorial and workshop experiences. Although we highlight the potential of these data sets in advancing data science education and health care artificial intelligence, we also emphasize the need for continued research into the inherent limitations of synthetic data.https://mededu.jmir.org/2024/1/e51388 |
spellingShingle | Nicholas I-Hsien Kuo Oscar Perez-Concha Mark Hanly Emmanuel Mnatzaganian Brandon Hao Marcus Di Sipio Guolin Yu Jash Vanjara Ivy Cerelia Valerie Juliana de Oliveira Costa Timothy Churches Sanja Lujic Jo Hegarty Louisa Jorm Sebastiano Barbieri Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project JMIR Medical Education |
title | Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project |
title_full | Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project |
title_fullStr | Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project |
title_full_unstemmed | Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project |
title_short | Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project |
title_sort | enriching data science and health care education application and impact of synthetic data sets through the health gym project |
url | https://mededu.jmir.org/2024/1/e51388 |
work_keys_str_mv | AT nicholasihsienkuo enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject AT oscarperezconcha enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject AT markhanly enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject AT emmanuelmnatzaganian enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject AT brandonhao enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject AT marcusdisipio enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject AT guolinyu enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject AT jashvanjara enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject AT ivycereliavalerie enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject AT julianadeoliveiracosta enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject AT timothychurches enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject AT sanjalujic enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject AT johegarty enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject AT louisajorm enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject AT sebastianobarbieri enrichingdatascienceandhealthcareeducationapplicationandimpactofsyntheticdatasetsthroughthehealthgymproject |