GediNET for discovering gene associations across diseases using knowledge based machine learning approach

Abstract The most common approaches to discovering genes associated with specific diseases are based on machine learning and use a variety of feature selection techniques to identify significant genes that can serve as biomarkers for a given disease. More recently, the integration in this process of...

Full description

Bibliographic Details
Main Authors: Emma Qumsiyeh, Louise Showe, Malik Yousef
Format: Article
Language:English
Published: Nature Portfolio 2022-11-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-022-24421-0
_version_ 1811319590668468224
author Emma Qumsiyeh
Louise Showe
Malik Yousef
author_facet Emma Qumsiyeh
Louise Showe
Malik Yousef
author_sort Emma Qumsiyeh
collection DOAJ
description Abstract The most common approaches to discovering genes associated with specific diseases are based on machine learning and use a variety of feature selection techniques to identify significant genes that can serve as biomarkers for a given disease. More recently, the integration in this process of prior knowledge-based approaches has shown significant promise in the discovery of new biomarkers with potential translational applications. In this study, we developed a novel approach, GediNET, that integrates prior biological knowledge to gene Groups that are shown to be associated with a specific disease such as a cancer. The novelty of GediNET is that it then also allows the discovery of significant associations between that specific disease and other diseases. The initial step in this process involves the identification of gene Groups. The Groups are then subjected to a Scoring component to identify the top performing classification Groups. The top-ranked gene Groups are then used to train a Machine Learning Model. The process of Grouping, Scoring and Modelling (G-S-M) is used by GediNET to identify other diseases that are similarly associated with this signature. GediNET identifies these relationships through Disease–Disease Association (DDA) based machine learning. DDA explores novel associations between diseases and identifies relationships which could be used to further improve approaches to diagnosis, prognosis, and treatment. The GediNET KNIME workflow can be downloaded from: https://github.com/malikyousef/GediNET.git or https://kni.me/w/3kH1SQV_mMUsMTS .
first_indexed 2024-04-13T12:45:39Z
format Article
id doaj.art-7e0d99f912504d01965405b91206c9e5
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-13T12:45:39Z
publishDate 2022-11-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-7e0d99f912504d01965405b91206c9e52022-12-22T02:46:22ZengNature PortfolioScientific Reports2045-23222022-11-0112111710.1038/s41598-022-24421-0GediNET for discovering gene associations across diseases using knowledge based machine learning approachEmma Qumsiyeh0Louise Showe1Malik Yousef2Information Technology Engineering, Al-Quds UniversityThe Wistar InstituteDepartment of Information Systems, Zefat Academic CollegeAbstract The most common approaches to discovering genes associated with specific diseases are based on machine learning and use a variety of feature selection techniques to identify significant genes that can serve as biomarkers for a given disease. More recently, the integration in this process of prior knowledge-based approaches has shown significant promise in the discovery of new biomarkers with potential translational applications. In this study, we developed a novel approach, GediNET, that integrates prior biological knowledge to gene Groups that are shown to be associated with a specific disease such as a cancer. The novelty of GediNET is that it then also allows the discovery of significant associations between that specific disease and other diseases. The initial step in this process involves the identification of gene Groups. The Groups are then subjected to a Scoring component to identify the top performing classification Groups. The top-ranked gene Groups are then used to train a Machine Learning Model. The process of Grouping, Scoring and Modelling (G-S-M) is used by GediNET to identify other diseases that are similarly associated with this signature. GediNET identifies these relationships through Disease–Disease Association (DDA) based machine learning. DDA explores novel associations between diseases and identifies relationships which could be used to further improve approaches to diagnosis, prognosis, and treatment. The GediNET KNIME workflow can be downloaded from: https://github.com/malikyousef/GediNET.git or https://kni.me/w/3kH1SQV_mMUsMTS .https://doi.org/10.1038/s41598-022-24421-0
spellingShingle Emma Qumsiyeh
Louise Showe
Malik Yousef
GediNET for discovering gene associations across diseases using knowledge based machine learning approach
Scientific Reports
title GediNET for discovering gene associations across diseases using knowledge based machine learning approach
title_full GediNET for discovering gene associations across diseases using knowledge based machine learning approach
title_fullStr GediNET for discovering gene associations across diseases using knowledge based machine learning approach
title_full_unstemmed GediNET for discovering gene associations across diseases using knowledge based machine learning approach
title_short GediNET for discovering gene associations across diseases using knowledge based machine learning approach
title_sort gedinet for discovering gene associations across diseases using knowledge based machine learning approach
url https://doi.org/10.1038/s41598-022-24421-0
work_keys_str_mv AT emmaqumsiyeh gedinetfordiscoveringgeneassociationsacrossdiseasesusingknowledgebasedmachinelearningapproach
AT louiseshowe gedinetfordiscoveringgeneassociationsacrossdiseasesusingknowledgebasedmachinelearningapproach
AT malikyousef gedinetfordiscoveringgeneassociationsacrossdiseasesusingknowledgebasedmachinelearningapproach