Enhancing rare disease diagnosis: a weighted cosine similarity approach for improved k-nearest neighbor algorithm

Diagnosing rare diseases is challenging because they affect only a restricted group of individuals, usually identified as one out of every 2,000 people within the European Union and no more than one out of 1,250 individuals in the United States. This makes it difficult for doctors to recognize the s...

Cijeli opis

Bibliografski detalji
Glavni autori: Abokadr, Somiya, Azman, Azreen, Hamdan, Hazlina, Amelina, Nurul
Format: Članak
Izdano: Little Lion Scientific 2023
_version_ 1825939505925849088
author Abokadr, Somiya
Azman, Azreen
Hamdan, Hazlina
Amelina, Nurul
author_facet Abokadr, Somiya
Azman, Azreen
Hamdan, Hazlina
Amelina, Nurul
author_sort Abokadr, Somiya
collection UPM
description Diagnosing rare diseases is challenging because they affect only a restricted group of individuals, usually identified as one out of every 2,000 people within the European Union and no more than one out of 1,250 individuals in the United States. This makes it difficult for doctors to recognize the symptoms of these diseases. This paper focuses on the challenges of diagnosing rare diseases due to their low prevalence rates and difficulties in recognizing their symptoms. Machine learning techniques often face difficulties in classifying patients with rare diseases because of their small sample sizes, leading to biased results. They proposed a weighted cosine similarity approach as a distance measure for the k-nearest neighbours algorithm instead of the conventional cosine similarity to address this issue. The use of genetic optimization to select the best weights for the weighted cosine similarity. The Rare Metabolic Diseases Database was used as a case study, and the results demonstrated that reducing the classification bias between majority and minority classes improves all classification performance measures. However, as the number of classes and imbalance ratio increase, the approach's effectiveness decreases, eventually reaching zero. Future work will focus on reformulating the g-mean to smooth its values and avoid assigning a zero score when all class instances are misclassified.
first_indexed 2024-12-09T02:19:29Z
format Article
id upm.eprints-107707
institution Universiti Putra Malaysia
last_indexed 2024-12-09T02:19:29Z
publishDate 2023
publisher Little Lion Scientific
record_format dspace
spelling upm.eprints-1077072024-10-28T01:59:07Z http://psasir.upm.edu.my/id/eprint/107707/ Enhancing rare disease diagnosis: a weighted cosine similarity approach for improved k-nearest neighbor algorithm Abokadr, Somiya Azman, Azreen Hamdan, Hazlina Amelina, Nurul Diagnosing rare diseases is challenging because they affect only a restricted group of individuals, usually identified as one out of every 2,000 people within the European Union and no more than one out of 1,250 individuals in the United States. This makes it difficult for doctors to recognize the symptoms of these diseases. This paper focuses on the challenges of diagnosing rare diseases due to their low prevalence rates and difficulties in recognizing their symptoms. Machine learning techniques often face difficulties in classifying patients with rare diseases because of their small sample sizes, leading to biased results. They proposed a weighted cosine similarity approach as a distance measure for the k-nearest neighbours algorithm instead of the conventional cosine similarity to address this issue. The use of genetic optimization to select the best weights for the weighted cosine similarity. The Rare Metabolic Diseases Database was used as a case study, and the results demonstrated that reducing the classification bias between majority and minority classes improves all classification performance measures. However, as the number of classes and imbalance ratio increase, the approach's effectiveness decreases, eventually reaching zero. Future work will focus on reformulating the g-mean to smooth its values and avoid assigning a zero score when all class instances are misclassified. Little Lion Scientific 2023-09-15 Article PeerReviewed Abokadr, Somiya and Azman, Azreen and Hamdan, Hazlina and Amelina, Nurul (2023) Enhancing rare disease diagnosis: a weighted cosine similarity approach for improved k-nearest neighbor algorithm. Journal of Theoretical and Applied Information Technology, 101 (17). pp. 6815-6824. ISSN 1992-8645; eISSN: 1817-3195 http://www.jatit.org/
spellingShingle Abokadr, Somiya
Azman, Azreen
Hamdan, Hazlina
Amelina, Nurul
Enhancing rare disease diagnosis: a weighted cosine similarity approach for improved k-nearest neighbor algorithm
title Enhancing rare disease diagnosis: a weighted cosine similarity approach for improved k-nearest neighbor algorithm
title_full Enhancing rare disease diagnosis: a weighted cosine similarity approach for improved k-nearest neighbor algorithm
title_fullStr Enhancing rare disease diagnosis: a weighted cosine similarity approach for improved k-nearest neighbor algorithm
title_full_unstemmed Enhancing rare disease diagnosis: a weighted cosine similarity approach for improved k-nearest neighbor algorithm
title_short Enhancing rare disease diagnosis: a weighted cosine similarity approach for improved k-nearest neighbor algorithm
title_sort enhancing rare disease diagnosis a weighted cosine similarity approach for improved k nearest neighbor algorithm
work_keys_str_mv AT abokadrsomiya enhancingrarediseasediagnosisaweightedcosinesimilarityapproachforimprovedknearestneighboralgorithm
AT azmanazreen enhancingrarediseasediagnosisaweightedcosinesimilarityapproachforimprovedknearestneighboralgorithm
AT hamdanhazlina enhancingrarediseasediagnosisaweightedcosinesimilarityapproachforimprovedknearestneighboralgorithm
AT amelinanurul enhancingrarediseasediagnosisaweightedcosinesimilarityapproachforimprovedknearestneighboralgorithm