Analysis of primary care computerized medical records (CMR) data with deep autoencoders (DAE)

The use of deep learning is becoming increasingly important in the analysis of medical data such as pattern recognition for classification. The use of primary healthcare computational medical records (CMR) data is vital in prediction of infection prevalence across a population, and decision making a...

Full description

Bibliographic Details
Main Authors: Thomas, SA, Smith, NA, Livina, V, Yonova, I, Webb, R, de Lusignan, S
Format: Journal article
Language:English
Published: Frontiers Media 2019
_version_ 1797056670027743232
author Thomas, SA
Smith, NA
Livina, V
Yonova, I
Webb, R
de Lusignan, S
author_facet Thomas, SA
Smith, NA
Livina, V
Yonova, I
Webb, R
de Lusignan, S
author_sort Thomas, SA
collection OXFORD
description The use of deep learning is becoming increasingly important in the analysis of medical data such as pattern recognition for classification. The use of primary healthcare computational medical records (CMR) data is vital in prediction of infection prevalence across a population, and decision making at a national scale. To date, the application of machine learning algorithms to CMR data remains under-utilized despite the potential impact for use in diagnostics or prevention of epidemics such as outbreaks of influenza. A particular challenge in epidemiology is how to differentiate incident cases from those that are follow-ups for the same condition. Furthermore, the CMR data are typically heterogeneous, noisy, high dimensional and incomplete, making automated analysis difficult. We introduce a methodology for converting heterogeneous data such that it is compatible with a deep autoencoder for reduction of CMR data. This approach provides a tool for real time visualization of these high dimensional data, revealing previously unknown dependencies and clusters. Our unsupervised nonlinear reduction method can be used to identify the features driving the formation of these clusters that can aid decision making in healthcare applications. The results in this work demonstrate that our methods can cluster more than 97.84% of the data (clusters >5 points) each of which is uniquely described by three attributes in the data: Clinical System (CMR system), Read Code (as recorded) and Read Term (standardized coding). Further, we propose the use of Shannon Entropy as a means to analyse the dispersion of clusters and the contribution from the underlying attributes to gain further insight from the data. Our results demonstrate that Shannon Entropy is a useful metric for analysing both the low dimensional clusters of CMR data, and also the features in the original heterogeneous data. Finally, we find that the entropy of the low dimensional clusters are directly representative of the entropy of the input data (Pearson Correlation = 0.99, R2 = 0.98) and therefore the reduced data from the deep autoencoder is reflective of the original CMR data variability.
first_indexed 2024-03-06T19:25:53Z
format Journal article
id oxford-uuid:1bae4742-adef-414d-ad3a-0a9112077694
institution University of Oxford
language English
last_indexed 2024-03-06T19:25:53Z
publishDate 2019
publisher Frontiers Media
record_format dspace
spelling oxford-uuid:1bae4742-adef-414d-ad3a-0a91120776942022-03-26T11:01:49ZAnalysis of primary care computerized medical records (CMR) data with deep autoencoders (DAE)Journal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:1bae4742-adef-414d-ad3a-0a9112077694EnglishSymplectic ElementsFrontiers Media2019Thomas, SASmith, NALivina, VYonova, IWebb, Rde Lusignan, SThe use of deep learning is becoming increasingly important in the analysis of medical data such as pattern recognition for classification. The use of primary healthcare computational medical records (CMR) data is vital in prediction of infection prevalence across a population, and decision making at a national scale. To date, the application of machine learning algorithms to CMR data remains under-utilized despite the potential impact for use in diagnostics or prevention of epidemics such as outbreaks of influenza. A particular challenge in epidemiology is how to differentiate incident cases from those that are follow-ups for the same condition. Furthermore, the CMR data are typically heterogeneous, noisy, high dimensional and incomplete, making automated analysis difficult. We introduce a methodology for converting heterogeneous data such that it is compatible with a deep autoencoder for reduction of CMR data. This approach provides a tool for real time visualization of these high dimensional data, revealing previously unknown dependencies and clusters. Our unsupervised nonlinear reduction method can be used to identify the features driving the formation of these clusters that can aid decision making in healthcare applications. The results in this work demonstrate that our methods can cluster more than 97.84% of the data (clusters >5 points) each of which is uniquely described by three attributes in the data: Clinical System (CMR system), Read Code (as recorded) and Read Term (standardized coding). Further, we propose the use of Shannon Entropy as a means to analyse the dispersion of clusters and the contribution from the underlying attributes to gain further insight from the data. Our results demonstrate that Shannon Entropy is a useful metric for analysing both the low dimensional clusters of CMR data, and also the features in the original heterogeneous data. Finally, we find that the entropy of the low dimensional clusters are directly representative of the entropy of the input data (Pearson Correlation = 0.99, R2 = 0.98) and therefore the reduced data from the deep autoencoder is reflective of the original CMR data variability.
spellingShingle Thomas, SA
Smith, NA
Livina, V
Yonova, I
Webb, R
de Lusignan, S
Analysis of primary care computerized medical records (CMR) data with deep autoencoders (DAE)
title Analysis of primary care computerized medical records (CMR) data with deep autoencoders (DAE)
title_full Analysis of primary care computerized medical records (CMR) data with deep autoencoders (DAE)
title_fullStr Analysis of primary care computerized medical records (CMR) data with deep autoencoders (DAE)
title_full_unstemmed Analysis of primary care computerized medical records (CMR) data with deep autoencoders (DAE)
title_short Analysis of primary care computerized medical records (CMR) data with deep autoencoders (DAE)
title_sort analysis of primary care computerized medical records cmr data with deep autoencoders dae
work_keys_str_mv AT thomassa analysisofprimarycarecomputerizedmedicalrecordscmrdatawithdeepautoencodersdae
AT smithna analysisofprimarycarecomputerizedmedicalrecordscmrdatawithdeepautoencodersdae
AT livinav analysisofprimarycarecomputerizedmedicalrecordscmrdatawithdeepautoencodersdae
AT yonovai analysisofprimarycarecomputerizedmedicalrecordscmrdatawithdeepautoencodersdae
AT webbr analysisofprimarycarecomputerizedmedicalrecordscmrdatawithdeepautoencodersdae
AT delusignans analysisofprimarycarecomputerizedmedicalrecordscmrdatawithdeepautoencodersdae