Synthetic data use: exploring use cases to optimise data utility
Abstract Synthetic data is a rapidly evolving field with growing interest from multiple industry stakeholders and European bodies. In particular, the pharmaceutical industry is starting to realise the value of synthetic data which is being utilised more prevalently as a method to optimise data utili...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2021-12-01
|
Series: | Discover Artificial Intelligence |
Subjects: | |
Online Access: | https://doi.org/10.1007/s44163-021-00016-y |
_version_ | 1819177205925675008 |
---|---|
author | Stefanie James Chris Harbron Janice Branson Mimmi Sundler |
author_facet | Stefanie James Chris Harbron Janice Branson Mimmi Sundler |
author_sort | Stefanie James |
collection | DOAJ |
description | Abstract Synthetic data is a rapidly evolving field with growing interest from multiple industry stakeholders and European bodies. In particular, the pharmaceutical industry is starting to realise the value of synthetic data which is being utilised more prevalently as a method to optimise data utility and sharing, ultimately as an innovative response to the growing demand for improved privacy. Synthetic data is data generated by simulation, based upon and mirroring properties of an original dataset. Here, with supporting viewpoints from across the pharmaceutical industry, we set out to explore use cases for synthetic data across seven key but relatable areas for optimising data utility for improved data privacy and protection. We also discuss the various methods which can be used to produce a synthetic dataset and availability of metrics to ensure robust quality of generated synthetic datasets. Lastly, we discuss the potential merits, challenges and future direction of synthetic data within the pharmaceutical industry and the considerations for this privacy enhancing technology. |
first_indexed | 2024-12-22T21:22:58Z |
format | Article |
id | doaj.art-588d0ba6fd3d4dd8b6ab418340d45088 |
institution | Directory Open Access Journal |
issn | 2731-0809 |
language | English |
last_indexed | 2024-12-22T21:22:58Z |
publishDate | 2021-12-01 |
publisher | Springer |
record_format | Article |
series | Discover Artificial Intelligence |
spelling | doaj.art-588d0ba6fd3d4dd8b6ab418340d450882022-12-21T18:12:08ZengSpringerDiscover Artificial Intelligence2731-08092021-12-011111310.1007/s44163-021-00016-ySynthetic data use: exploring use cases to optimise data utilityStefanie James0Chris Harbron1Janice Branson2Mimmi Sundler3Data Policy Director, Data Office, Data Science and Artificial Intelligence, Biopharmaceuticals, Research and Development, AstraZeneca, Academy HouseExpert Statistical ScientistGlobal Head of Advanced Methodology and Data Science, NovartisHead of Data and AI Governance and Policy R&D, Data Office, Data Science and Artificial Intelligence, Biopharmaceuticals, Research and Development, AstraZenecaAbstract Synthetic data is a rapidly evolving field with growing interest from multiple industry stakeholders and European bodies. In particular, the pharmaceutical industry is starting to realise the value of synthetic data which is being utilised more prevalently as a method to optimise data utility and sharing, ultimately as an innovative response to the growing demand for improved privacy. Synthetic data is data generated by simulation, based upon and mirroring properties of an original dataset. Here, with supporting viewpoints from across the pharmaceutical industry, we set out to explore use cases for synthetic data across seven key but relatable areas for optimising data utility for improved data privacy and protection. We also discuss the various methods which can be used to produce a synthetic dataset and availability of metrics to ensure robust quality of generated synthetic datasets. Lastly, we discuss the potential merits, challenges and future direction of synthetic data within the pharmaceutical industry and the considerations for this privacy enhancing technology.https://doi.org/10.1007/s44163-021-00016-ySynthetic dataArtificial intelligencePrivacyPrivacy enhancing technologyPharmasecuticalsSoftware testing |
spellingShingle | Stefanie James Chris Harbron Janice Branson Mimmi Sundler Synthetic data use: exploring use cases to optimise data utility Discover Artificial Intelligence Synthetic data Artificial intelligence Privacy Privacy enhancing technology Pharmasecuticals Software testing |
title | Synthetic data use: exploring use cases to optimise data utility |
title_full | Synthetic data use: exploring use cases to optimise data utility |
title_fullStr | Synthetic data use: exploring use cases to optimise data utility |
title_full_unstemmed | Synthetic data use: exploring use cases to optimise data utility |
title_short | Synthetic data use: exploring use cases to optimise data utility |
title_sort | synthetic data use exploring use cases to optimise data utility |
topic | Synthetic data Artificial intelligence Privacy Privacy enhancing technology Pharmasecuticals Software testing |
url | https://doi.org/10.1007/s44163-021-00016-y |
work_keys_str_mv | AT stefaniejames syntheticdatauseexploringusecasestooptimisedatautility AT chrisharbron syntheticdatauseexploringusecasestooptimisedatautility AT janicebranson syntheticdatauseexploringusecasestooptimisedatautility AT mimmisundler syntheticdatauseexploringusecasestooptimisedatautility |