Data encoding for healthcare data democratization and information leakage prevention

The lack of data democratization and information leakage from trained models hinder the development and acceptance of robust deep learning-based healthcare solutions. This paper argues that irreversible data encoding can provide an effective solution to achieve data democratization without violating...

Full description

Bibliographic Details
Main Authors: Wang, Y, Armstrong, J, Thakur, A, Zhu, T, Abrol, V, Clifton, DA
Format: Journal article
Language:English
Published: Springer Nature 2024
_version_ 1826312439280435200
author Wang, Y
Armstrong, J
Thakur, A
Zhu, T
Abrol, V
Clifton, DA
author_facet Wang, Y
Armstrong, J
Thakur, A
Zhu, T
Abrol, V
Clifton, DA
author_sort Wang, Y
collection OXFORD
description The lack of data democratization and information leakage from trained models hinder the development and acceptance of robust deep learning-based healthcare solutions. This paper argues that irreversible data encoding can provide an effective solution to achieve data democratization without violating the privacy constraints imposed on healthcare data and clinical models. An ideal encoding framework transforms the data into a new space where it is imperceptible to a manual or computational inspection. However, encoded data should preserve the semantics of the original data such that deep learning models can be trained effectively. This paper hypothesizes the characteristics of the desired encoding framework and then exploits random projections and random quantum encoding to realize this framework for dense and longitudinal or time-series data. Experimental evaluation highlights that models trained on encoded time-series data effectively uphold the information bottleneck principle and hence, exhibit lesser information leakage from trained models.
first_indexed 2024-04-09T03:54:35Z
format Journal article
id oxford-uuid:4bd45b18-0527-467b-a33e-b6ab212e4498
institution University of Oxford
language English
last_indexed 2024-04-09T03:54:35Z
publishDate 2024
publisher Springer Nature
record_format dspace
spelling oxford-uuid:4bd45b18-0527-467b-a33e-b6ab212e44982024-03-11T07:01:44ZData encoding for healthcare data democratization and information leakage preventionJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:4bd45b18-0527-467b-a33e-b6ab212e4498EnglishSymplectic ElementsSpringer Nature2024Wang, YArmstrong, JThakur, AZhu, TAbrol, VClifton, DAThe lack of data democratization and information leakage from trained models hinder the development and acceptance of robust deep learning-based healthcare solutions. This paper argues that irreversible data encoding can provide an effective solution to achieve data democratization without violating the privacy constraints imposed on healthcare data and clinical models. An ideal encoding framework transforms the data into a new space where it is imperceptible to a manual or computational inspection. However, encoded data should preserve the semantics of the original data such that deep learning models can be trained effectively. This paper hypothesizes the characteristics of the desired encoding framework and then exploits random projections and random quantum encoding to realize this framework for dense and longitudinal or time-series data. Experimental evaluation highlights that models trained on encoded time-series data effectively uphold the information bottleneck principle and hence, exhibit lesser information leakage from trained models.
spellingShingle Wang, Y
Armstrong, J
Thakur, A
Zhu, T
Abrol, V
Clifton, DA
Data encoding for healthcare data democratization and information leakage prevention
title Data encoding for healthcare data democratization and information leakage prevention
title_full Data encoding for healthcare data democratization and information leakage prevention
title_fullStr Data encoding for healthcare data democratization and information leakage prevention
title_full_unstemmed Data encoding for healthcare data democratization and information leakage prevention
title_short Data encoding for healthcare data democratization and information leakage prevention
title_sort data encoding for healthcare data democratization and information leakage prevention
work_keys_str_mv AT wangy dataencodingforhealthcaredatademocratizationandinformationleakageprevention
AT armstrongj dataencodingforhealthcaredatademocratizationandinformationleakageprevention
AT thakura dataencodingforhealthcaredatademocratizationandinformationleakageprevention
AT zhut dataencodingforhealthcaredatademocratizationandinformationleakageprevention
AT abrolv dataencodingforhealthcaredatademocratizationandinformationleakageprevention
AT cliftonda dataencodingforhealthcaredatademocratizationandinformationleakageprevention