Synthetic data use: exploring use cases to optimise data utility

Abstract Synthetic data is a rapidly evolving field with growing interest from multiple industry stakeholders and European bodies. In particular, the pharmaceutical industry is starting to realise the value of synthetic data which is being utilised more prevalently as a method to optimise data utili...

Full description

Bibliographic Details
Main Authors: Stefanie James, Chris Harbron, Janice Branson, Mimmi Sundler
Format: Article
Language:English
Published: Springer 2021-12-01
Series:Discover Artificial Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44163-021-00016-y
Description
Summary:Abstract Synthetic data is a rapidly evolving field with growing interest from multiple industry stakeholders and European bodies. In particular, the pharmaceutical industry is starting to realise the value of synthetic data which is being utilised more prevalently as a method to optimise data utility and sharing, ultimately as an innovative response to the growing demand for improved privacy. Synthetic data is data generated by simulation, based upon and mirroring properties of an original dataset. Here, with supporting viewpoints from across the pharmaceutical industry, we set out to explore use cases for synthetic data across seven key but relatable areas for optimising data utility for improved data privacy and protection. We also discuss the various methods which can be used to produce a synthetic dataset and availability of metrics to ensure robust quality of generated synthetic datasets. Lastly, we discuss the potential merits, challenges and future direction of synthetic data within the pharmaceutical industry and the considerations for this privacy enhancing technology.
ISSN:2731-0809