The OMOP common data model in Australian primary care data: Building a quality research ready harmonised dataset.

<h4>Background</h4>The use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can...

Full description

Bibliographic Details
Main Authors: Roger Ward, Christine Mary Hallinan, David Ormiston-Smith, Christine Chidgey, Dougie Boyle
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0301557&type=printable
_version_ 1797194884953669632
author Roger Ward
Christine Mary Hallinan
David Ormiston-Smith
Christine Chidgey
Dougie Boyle
author_facet Roger Ward
Christine Mary Hallinan
David Ormiston-Smith
Christine Chidgey
Dougie Boyle
author_sort Roger Ward
collection DOAJ
description <h4>Background</h4>The use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and 'validation' analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.<h4>Methods</h4>We used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.<h4>Results</h4>Across three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A 'FAIL' occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.<h4>Conclusion</h4>The OMOP CDM's widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.
first_indexed 2024-04-24T06:03:23Z
format Article
id doaj.art-a4f1d0ba63b640c4a6ac7889e97bd2f4
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-24T06:03:23Z
publishDate 2024-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-a4f1d0ba63b640c4a6ac7889e97bd2f42024-04-23T05:31:49ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-01194e030155710.1371/journal.pone.0301557The OMOP common data model in Australian primary care data: Building a quality research ready harmonised dataset.Roger WardChristine Mary HallinanDavid Ormiston-SmithChristine ChidgeyDougie Boyle<h4>Background</h4>The use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and 'validation' analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.<h4>Methods</h4>We used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.<h4>Results</h4>Across three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A 'FAIL' occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.<h4>Conclusion</h4>The OMOP CDM's widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0301557&type=printable
spellingShingle Roger Ward
Christine Mary Hallinan
David Ormiston-Smith
Christine Chidgey
Dougie Boyle
The OMOP common data model in Australian primary care data: Building a quality research ready harmonised dataset.
PLoS ONE
title The OMOP common data model in Australian primary care data: Building a quality research ready harmonised dataset.
title_full The OMOP common data model in Australian primary care data: Building a quality research ready harmonised dataset.
title_fullStr The OMOP common data model in Australian primary care data: Building a quality research ready harmonised dataset.
title_full_unstemmed The OMOP common data model in Australian primary care data: Building a quality research ready harmonised dataset.
title_short The OMOP common data model in Australian primary care data: Building a quality research ready harmonised dataset.
title_sort omop common data model in australian primary care data building a quality research ready harmonised dataset
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0301557&type=printable
work_keys_str_mv AT rogerward theomopcommondatamodelinaustralianprimarycaredatabuildingaqualityresearchreadyharmoniseddataset
AT christinemaryhallinan theomopcommondatamodelinaustralianprimarycaredatabuildingaqualityresearchreadyharmoniseddataset
AT davidormistonsmith theomopcommondatamodelinaustralianprimarycaredatabuildingaqualityresearchreadyharmoniseddataset
AT christinechidgey theomopcommondatamodelinaustralianprimarycaredatabuildingaqualityresearchreadyharmoniseddataset
AT dougieboyle theomopcommondatamodelinaustralianprimarycaredatabuildingaqualityresearchreadyharmoniseddataset
AT rogerward omopcommondatamodelinaustralianprimarycaredatabuildingaqualityresearchreadyharmoniseddataset
AT christinemaryhallinan omopcommondatamodelinaustralianprimarycaredatabuildingaqualityresearchreadyharmoniseddataset
AT davidormistonsmith omopcommondatamodelinaustralianprimarycaredatabuildingaqualityresearchreadyharmoniseddataset
AT christinechidgey omopcommondatamodelinaustralianprimarycaredatabuildingaqualityresearchreadyharmoniseddataset
AT dougieboyle omopcommondatamodelinaustralianprimarycaredatabuildingaqualityresearchreadyharmoniseddataset