A natural language processing approach towards harmonisation of European medicinal product information.

Product information (PI) is a vital part of any medicinal product approved for use within the European Union and consists of a summary of products characteristics (SmPC) for healthcare professionals and package leaflet (PL) for patients, together with the product packaging. In this study, based on t...

Full description

Bibliographic Details
Main Authors: Erik Bergman, Kim Sherwood, Markus Forslund, Peter Arlett, Gabriel Westman
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0275386
_version_ 1797989577838821376
author Erik Bergman
Kim Sherwood
Markus Forslund
Peter Arlett
Gabriel Westman
author_facet Erik Bergman
Kim Sherwood
Markus Forslund
Peter Arlett
Gabriel Westman
author_sort Erik Bergman
collection DOAJ
description Product information (PI) is a vital part of any medicinal product approved for use within the European Union and consists of a summary of products characteristics (SmPC) for healthcare professionals and package leaflet (PL) for patients, together with the product packaging. In this study, based on the English corpus of the EMA product information documents for all centrally approved medicinal products within the EU, a BERT sentence embedding model was used together with clustering and dimensional reduction techniques to identify sentence similarity clusters that could be candidates for standardization. A total of 1258 medicinal products were included in the study. From these, a total of 783 K sentences were extracted from SmPC and PL documents which were aggregated into a total of 284 and 129 semantic similarity clusters, respectively. The spread distribution among clusters shows separation into different cluster types. Examples of clusters with low spread include those with identical word embeddings due to current standardization, such as section headings and standard phrases. Others show minor linguistic variations, while the group with the largest variability contains variable wording but with significant semantic overlap. The sentence clusters identified could serve as candidates for further standardization of the PI. Moving from free text human wording to auto-generated text elements based on multiple-choice input for appropriate parts of the package leaflet and summary of product characteristics, could reduce both time and complexity for applicants as well as regulators, and ultimately provide patients and prescribers with documents that are easier to understand and better adapted for search availabilities.
first_indexed 2024-04-11T08:21:37Z
format Article
id doaj.art-ae4cbce10d1d45899338b3a8fa6a283f
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-11T08:21:37Z
publishDate 2022-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-ae4cbce10d1d45899338b3a8fa6a283f2022-12-22T04:34:56ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-011710e027538610.1371/journal.pone.0275386A natural language processing approach towards harmonisation of European medicinal product information.Erik BergmanKim SherwoodMarkus ForslundPeter ArlettGabriel WestmanProduct information (PI) is a vital part of any medicinal product approved for use within the European Union and consists of a summary of products characteristics (SmPC) for healthcare professionals and package leaflet (PL) for patients, together with the product packaging. In this study, based on the English corpus of the EMA product information documents for all centrally approved medicinal products within the EU, a BERT sentence embedding model was used together with clustering and dimensional reduction techniques to identify sentence similarity clusters that could be candidates for standardization. A total of 1258 medicinal products were included in the study. From these, a total of 783 K sentences were extracted from SmPC and PL documents which were aggregated into a total of 284 and 129 semantic similarity clusters, respectively. The spread distribution among clusters shows separation into different cluster types. Examples of clusters with low spread include those with identical word embeddings due to current standardization, such as section headings and standard phrases. Others show minor linguistic variations, while the group with the largest variability contains variable wording but with significant semantic overlap. The sentence clusters identified could serve as candidates for further standardization of the PI. Moving from free text human wording to auto-generated text elements based on multiple-choice input for appropriate parts of the package leaflet and summary of product characteristics, could reduce both time and complexity for applicants as well as regulators, and ultimately provide patients and prescribers with documents that are easier to understand and better adapted for search availabilities.https://doi.org/10.1371/journal.pone.0275386
spellingShingle Erik Bergman
Kim Sherwood
Markus Forslund
Peter Arlett
Gabriel Westman
A natural language processing approach towards harmonisation of European medicinal product information.
PLoS ONE
title A natural language processing approach towards harmonisation of European medicinal product information.
title_full A natural language processing approach towards harmonisation of European medicinal product information.
title_fullStr A natural language processing approach towards harmonisation of European medicinal product information.
title_full_unstemmed A natural language processing approach towards harmonisation of European medicinal product information.
title_short A natural language processing approach towards harmonisation of European medicinal product information.
title_sort natural language processing approach towards harmonisation of european medicinal product information
url https://doi.org/10.1371/journal.pone.0275386
work_keys_str_mv AT erikbergman anaturallanguageprocessingapproachtowardsharmonisationofeuropeanmedicinalproductinformation
AT kimsherwood anaturallanguageprocessingapproachtowardsharmonisationofeuropeanmedicinalproductinformation
AT markusforslund anaturallanguageprocessingapproachtowardsharmonisationofeuropeanmedicinalproductinformation
AT peterarlett anaturallanguageprocessingapproachtowardsharmonisationofeuropeanmedicinalproductinformation
AT gabrielwestman anaturallanguageprocessingapproachtowardsharmonisationofeuropeanmedicinalproductinformation
AT erikbergman naturallanguageprocessingapproachtowardsharmonisationofeuropeanmedicinalproductinformation
AT kimsherwood naturallanguageprocessingapproachtowardsharmonisationofeuropeanmedicinalproductinformation
AT markusforslund naturallanguageprocessingapproachtowardsharmonisationofeuropeanmedicinalproductinformation
AT peterarlett naturallanguageprocessingapproachtowardsharmonisationofeuropeanmedicinalproductinformation
AT gabrielwestman naturallanguageprocessingapproachtowardsharmonisationofeuropeanmedicinalproductinformation