Synthetic data use: exploring use cases to optimise data utility

Abstract Synthetic data is a rapidly evolving field with growing interest from multiple industry stakeholders and European bodies. In particular, the pharmaceutical industry is starting to realise the value of synthetic data which is being utilised more prevalently as a method to optimise data utili...

Full description

Bibliographic Details
Main Authors: Stefanie James, Chris Harbron, Janice Branson, Mimmi Sundler
Format: Article
Language:English
Published: Springer 2021-12-01
Series:Discover Artificial Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44163-021-00016-y
_version_ 1819177205925675008
author Stefanie James
Chris Harbron
Janice Branson
Mimmi Sundler
author_facet Stefanie James
Chris Harbron
Janice Branson
Mimmi Sundler
author_sort Stefanie James
collection DOAJ
description Abstract Synthetic data is a rapidly evolving field with growing interest from multiple industry stakeholders and European bodies. In particular, the pharmaceutical industry is starting to realise the value of synthetic data which is being utilised more prevalently as a method to optimise data utility and sharing, ultimately as an innovative response to the growing demand for improved privacy. Synthetic data is data generated by simulation, based upon and mirroring properties of an original dataset. Here, with supporting viewpoints from across the pharmaceutical industry, we set out to explore use cases for synthetic data across seven key but relatable areas for optimising data utility for improved data privacy and protection. We also discuss the various methods which can be used to produce a synthetic dataset and availability of metrics to ensure robust quality of generated synthetic datasets. Lastly, we discuss the potential merits, challenges and future direction of synthetic data within the pharmaceutical industry and the considerations for this privacy enhancing technology.
first_indexed 2024-12-22T21:22:58Z
format Article
id doaj.art-588d0ba6fd3d4dd8b6ab418340d45088
institution Directory Open Access Journal
issn 2731-0809
language English
last_indexed 2024-12-22T21:22:58Z
publishDate 2021-12-01
publisher Springer
record_format Article
series Discover Artificial Intelligence
spelling doaj.art-588d0ba6fd3d4dd8b6ab418340d450882022-12-21T18:12:08ZengSpringerDiscover Artificial Intelligence2731-08092021-12-011111310.1007/s44163-021-00016-ySynthetic data use: exploring use cases to optimise data utilityStefanie James0Chris Harbron1Janice Branson2Mimmi Sundler3Data Policy Director, Data Office, Data Science and Artificial Intelligence, Biopharmaceuticals, Research and Development, AstraZeneca, Academy HouseExpert Statistical ScientistGlobal Head of Advanced Methodology and Data Science, NovartisHead of Data and AI Governance and Policy R&D, Data Office, Data Science and Artificial Intelligence, Biopharmaceuticals, Research and Development, AstraZenecaAbstract Synthetic data is a rapidly evolving field with growing interest from multiple industry stakeholders and European bodies. In particular, the pharmaceutical industry is starting to realise the value of synthetic data which is being utilised more prevalently as a method to optimise data utility and sharing, ultimately as an innovative response to the growing demand for improved privacy. Synthetic data is data generated by simulation, based upon and mirroring properties of an original dataset. Here, with supporting viewpoints from across the pharmaceutical industry, we set out to explore use cases for synthetic data across seven key but relatable areas for optimising data utility for improved data privacy and protection. We also discuss the various methods which can be used to produce a synthetic dataset and availability of metrics to ensure robust quality of generated synthetic datasets. Lastly, we discuss the potential merits, challenges and future direction of synthetic data within the pharmaceutical industry and the considerations for this privacy enhancing technology.https://doi.org/10.1007/s44163-021-00016-ySynthetic dataArtificial intelligencePrivacyPrivacy enhancing technologyPharmasecuticalsSoftware testing
spellingShingle Stefanie James
Chris Harbron
Janice Branson
Mimmi Sundler
Synthetic data use: exploring use cases to optimise data utility
Discover Artificial Intelligence
Synthetic data
Artificial intelligence
Privacy
Privacy enhancing technology
Pharmasecuticals
Software testing
title Synthetic data use: exploring use cases to optimise data utility
title_full Synthetic data use: exploring use cases to optimise data utility
title_fullStr Synthetic data use: exploring use cases to optimise data utility
title_full_unstemmed Synthetic data use: exploring use cases to optimise data utility
title_short Synthetic data use: exploring use cases to optimise data utility
title_sort synthetic data use exploring use cases to optimise data utility
topic Synthetic data
Artificial intelligence
Privacy
Privacy enhancing technology
Pharmasecuticals
Software testing
url https://doi.org/10.1007/s44163-021-00016-y
work_keys_str_mv AT stefaniejames syntheticdatauseexploringusecasestooptimisedatautility
AT chrisharbron syntheticdatauseexploringusecasestooptimisedatautility
AT janicebranson syntheticdatauseexploringusecasestooptimisedatautility
AT mimmisundler syntheticdatauseexploringusecasestooptimisedatautility