Development and validation of the SickKids Enterprise-wide Data in Azure Repository (SEDAR)

Objectives: To describe the processes developed by The Hospital for Sick Children (SickKids) to enable utilization of electronic health record (EHR) data by creating sequentially transformed schemas for use across multiple user types. Methods: We used Microsoft Azure as the cloud service provider an...

Full description

Bibliographic Details
Main Authors: Lin Lawrence Guo, Maryann Calligan, Emily Vettese, Sadie Cook, George Gagnidze, Oscar Han, Jiro Inoue, Joshua Lemmon, Johnson Li, Medhat Roshdi, Bohdan Sadovy, Steven Wallace, Lillian Sung
Format: Article
Language:English
Published: Elsevier 2023-11-01
Series:Heliyon
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405844023087947
_version_ 1797429812458946560
author Lin Lawrence Guo
Maryann Calligan
Emily Vettese
Sadie Cook
George Gagnidze
Oscar Han
Jiro Inoue
Joshua Lemmon
Johnson Li
Medhat Roshdi
Bohdan Sadovy
Steven Wallace
Lillian Sung
author_facet Lin Lawrence Guo
Maryann Calligan
Emily Vettese
Sadie Cook
George Gagnidze
Oscar Han
Jiro Inoue
Joshua Lemmon
Johnson Li
Medhat Roshdi
Bohdan Sadovy
Steven Wallace
Lillian Sung
author_sort Lin Lawrence Guo
collection DOAJ
description Objectives: To describe the processes developed by The Hospital for Sick Children (SickKids) to enable utilization of electronic health record (EHR) data by creating sequentially transformed schemas for use across multiple user types. Methods: We used Microsoft Azure as the cloud service provider and named this effort the SickKids Enterprise-wide Data in Azure Repository (SEDAR). Epic Clarity data from on-premises was copied to a virtual network in Microsoft Azure. Three sequential schemas were developed. The Filtered Schema added a filter to retain only SickKids and valid patients. The Curated Schema created a data structure that was easier to navigate and query. Each table contained a logical unit such as patients, hospital encounters or laboratory tests. Data validation of randomly sampled observations in the Curated Schema was performed. The SK-OMOP Schema was designed to facilitate research and machine learning. Two individuals mapped medical elements to standard Observational Medical Outcomes Partnership (OMOP) concepts. Results: A copy of Clarity data was transferred to Microsoft Azure and updated each night using log shipping. The Filtered Schema and Curated Schema were implemented as stored procedures and executed each night with incremental updates or full loads. Data validation required up to 16 iterations for each Curated Schema table. OMOP concept mapping achieved at least 80 % coverage for each SK-OMOP table. Conclusions: We described our experience in creating three sequential schemas to address different EHR data access requirements. Future work should consider replicating this approach at other institutions to determine whether approaches are generalizable.
first_indexed 2024-03-09T09:18:34Z
format Article
id doaj.art-22dcf145eb9b4594ba52fc879c37547e
institution Directory Open Access Journal
issn 2405-8440
language English
last_indexed 2024-03-09T09:18:34Z
publishDate 2023-11-01
publisher Elsevier
record_format Article
series Heliyon
spelling doaj.art-22dcf145eb9b4594ba52fc879c37547e2023-12-02T07:03:15ZengElsevierHeliyon2405-84402023-11-01911e21586Development and validation of the SickKids Enterprise-wide Data in Azure Repository (SEDAR)Lin Lawrence Guo0Maryann Calligan1Emily Vettese2Sadie Cook3George Gagnidze4Oscar Han5Jiro Inoue6Joshua Lemmon7Johnson Li8Medhat Roshdi9Bohdan Sadovy10Steven Wallace11Lillian Sung12Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, CanadaProgram in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, CanadaProgram in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, CanadaProgram in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, CanadaInformation Management Technology, The Hospital for Sick Children, Toronto, CanadaInformation Management Technology, The Hospital for Sick Children, Toronto, CanadaProgram in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, CanadaProgram in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, CanadaInformation Management Technology, The Hospital for Sick Children, Toronto, CanadaInformation Management Technology, The Hospital for Sick Children, Toronto, CanadaInformation Management Technology, The Hospital for Sick Children, Toronto, CanadaInformation Management Technology, The Hospital for Sick Children, Toronto, CanadaProgram in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Canada; Division of Haematology/Oncology, The Hospital for Sick Children, Toronto, Canada; Corresponding author. The Division of Haematology/Oncology, The Hospital for Sick Children, 555 University Avenue, Toronto, Ontario, M5G1X8, Canada.Objectives: To describe the processes developed by The Hospital for Sick Children (SickKids) to enable utilization of electronic health record (EHR) data by creating sequentially transformed schemas for use across multiple user types. Methods: We used Microsoft Azure as the cloud service provider and named this effort the SickKids Enterprise-wide Data in Azure Repository (SEDAR). Epic Clarity data from on-premises was copied to a virtual network in Microsoft Azure. Three sequential schemas were developed. The Filtered Schema added a filter to retain only SickKids and valid patients. The Curated Schema created a data structure that was easier to navigate and query. Each table contained a logical unit such as patients, hospital encounters or laboratory tests. Data validation of randomly sampled observations in the Curated Schema was performed. The SK-OMOP Schema was designed to facilitate research and machine learning. Two individuals mapped medical elements to standard Observational Medical Outcomes Partnership (OMOP) concepts. Results: A copy of Clarity data was transferred to Microsoft Azure and updated each night using log shipping. The Filtered Schema and Curated Schema were implemented as stored procedures and executed each night with incremental updates or full loads. Data validation required up to 16 iterations for each Curated Schema table. OMOP concept mapping achieved at least 80 % coverage for each SK-OMOP table. Conclusions: We described our experience in creating three sequential schemas to address different EHR data access requirements. Future work should consider replicating this approach at other institutions to determine whether approaches are generalizable.http://www.sciencedirect.com/science/article/pii/S2405844023087947Electronic health recordsMicrosoft AzureSchemaValidationOMOP-CDM
spellingShingle Lin Lawrence Guo
Maryann Calligan
Emily Vettese
Sadie Cook
George Gagnidze
Oscar Han
Jiro Inoue
Joshua Lemmon
Johnson Li
Medhat Roshdi
Bohdan Sadovy
Steven Wallace
Lillian Sung
Development and validation of the SickKids Enterprise-wide Data in Azure Repository (SEDAR)
Heliyon
Electronic health records
Microsoft Azure
Schema
Validation
OMOP-CDM
title Development and validation of the SickKids Enterprise-wide Data in Azure Repository (SEDAR)
title_full Development and validation of the SickKids Enterprise-wide Data in Azure Repository (SEDAR)
title_fullStr Development and validation of the SickKids Enterprise-wide Data in Azure Repository (SEDAR)
title_full_unstemmed Development and validation of the SickKids Enterprise-wide Data in Azure Repository (SEDAR)
title_short Development and validation of the SickKids Enterprise-wide Data in Azure Repository (SEDAR)
title_sort development and validation of the sickkids enterprise wide data in azure repository sedar
topic Electronic health records
Microsoft Azure
Schema
Validation
OMOP-CDM
url http://www.sciencedirect.com/science/article/pii/S2405844023087947
work_keys_str_mv AT linlawrenceguo developmentandvalidationofthesickkidsenterprisewidedatainazurerepositorysedar
AT maryanncalligan developmentandvalidationofthesickkidsenterprisewidedatainazurerepositorysedar
AT emilyvettese developmentandvalidationofthesickkidsenterprisewidedatainazurerepositorysedar
AT sadiecook developmentandvalidationofthesickkidsenterprisewidedatainazurerepositorysedar
AT georgegagnidze developmentandvalidationofthesickkidsenterprisewidedatainazurerepositorysedar
AT oscarhan developmentandvalidationofthesickkidsenterprisewidedatainazurerepositorysedar
AT jiroinoue developmentandvalidationofthesickkidsenterprisewidedatainazurerepositorysedar
AT joshualemmon developmentandvalidationofthesickkidsenterprisewidedatainazurerepositorysedar
AT johnsonli developmentandvalidationofthesickkidsenterprisewidedatainazurerepositorysedar
AT medhatroshdi developmentandvalidationofthesickkidsenterprisewidedatainazurerepositorysedar
AT bohdansadovy developmentandvalidationofthesickkidsenterprisewidedatainazurerepositorysedar
AT stevenwallace developmentandvalidationofthesickkidsenterprisewidedatainazurerepositorysedar
AT lilliansung developmentandvalidationofthesickkidsenterprisewidedatainazurerepositorysedar