Comparing distance measures on assessed medical device incident data using Average Silhouette Width

Many machine learning algorithms depend on the choice of an appropriate similarity or distance measure. Comparing such measures in different domains and on diversely structured data is common, but often performed in regards of an algorithm to cluster or classify the data. In this study, data assesse...

Full description

Bibliographic Details
Main Authors: Bayer Christian, Seidel Robin
Format: Article
Language:English
Published: De Gruyter 2018-09-01
Series:Current Directions in Biomedical Engineering
Subjects:
Online Access:https://doi.org/10.1515/cdbme-2018-0126
_version_ 1797989307735080960
author Bayer Christian
Seidel Robin
author_facet Bayer Christian
Seidel Robin
author_sort Bayer Christian
collection DOAJ
description Many machine learning algorithms depend on the choice of an appropriate similarity or distance measure. Comparing such measures in different domains and on diversely structured data is common, but often performed in regards of an algorithm to cluster or classify the data. In this study, data assessed by experts is analyzed instead. The data is taken from the database of the Federal Institute for Drugs and Medical Devices (BfArM) and represents free text incident reports. The Average Silhouette Width, a cluster density measure, is used to compare the distance measures’ ability to discriminate the data according to the experts’ assessments. The Euclidean distance and four distance measures derived from the Jaccard similarity, the Simple Matching similarity, the Cosine similarity and the Yule similarity are compared on four subsets of this database. The results show, that a better data preprocessing is necessary, possibly due to boilerplate texts being used to write incident reports. These results will also provide the basis to compare improvements by different methods of data preprocessing in the future.
first_indexed 2024-04-11T08:18:14Z
format Article
id doaj.art-844bca90ff1849b1be830b13308210de
institution Directory Open Access Journal
issn 2364-5504
language English
last_indexed 2024-04-11T08:18:14Z
publishDate 2018-09-01
publisher De Gruyter
record_format Article
series Current Directions in Biomedical Engineering
spelling doaj.art-844bca90ff1849b1be830b13308210de2022-12-22T04:35:04ZengDe GruyterCurrent Directions in Biomedical Engineering2364-55042018-09-014152552810.1515/cdbme-2018-0126cdbme-2018-0126Comparing distance measures on assessed medical device incident data using Average Silhouette WidthBayer Christian0Seidel RobinInstitute for Drugs and Medical Devices,Bonn, GermanyMany machine learning algorithms depend on the choice of an appropriate similarity or distance measure. Comparing such measures in different domains and on diversely structured data is common, but often performed in regards of an algorithm to cluster or classify the data. In this study, data assessed by experts is analyzed instead. The data is taken from the database of the Federal Institute for Drugs and Medical Devices (BfArM) and represents free text incident reports. The Average Silhouette Width, a cluster density measure, is used to compare the distance measures’ ability to discriminate the data according to the experts’ assessments. The Euclidean distance and four distance measures derived from the Jaccard similarity, the Simple Matching similarity, the Cosine similarity and the Yule similarity are compared on four subsets of this database. The results show, that a better data preprocessing is necessary, possibly due to boilerplate texts being used to write incident reports. These results will also provide the basis to compare improvements by different methods of data preprocessing in the future.https://doi.org/10.1515/cdbme-2018-0126average silhouette widthmachine learningdistance measuresregulatory affairstext categorization
spellingShingle Bayer Christian
Seidel Robin
Comparing distance measures on assessed medical device incident data using Average Silhouette Width
Current Directions in Biomedical Engineering
average silhouette width
machine learning
distance measures
regulatory affairs
text categorization
title Comparing distance measures on assessed medical device incident data using Average Silhouette Width
title_full Comparing distance measures on assessed medical device incident data using Average Silhouette Width
title_fullStr Comparing distance measures on assessed medical device incident data using Average Silhouette Width
title_full_unstemmed Comparing distance measures on assessed medical device incident data using Average Silhouette Width
title_short Comparing distance measures on assessed medical device incident data using Average Silhouette Width
title_sort comparing distance measures on assessed medical device incident data using average silhouette width
topic average silhouette width
machine learning
distance measures
regulatory affairs
text categorization
url https://doi.org/10.1515/cdbme-2018-0126
work_keys_str_mv AT bayerchristian comparingdistancemeasuresonassessedmedicaldeviceincidentdatausingaveragesilhouettewidth
AT seidelrobin comparingdistancemeasuresonassessedmedicaldeviceincidentdatausingaveragesilhouettewidth