Comparison Analysis of Gene Expression Profiles Proximity Metrics

The problems of gene regulatory network (GRN) reconstruction and the creation of disease diagnostic effective systems based on genes expression data are some of the current directions of modern bioinformatics. In this manuscript, we present the results of the research focused on the evaluation of th...

Full description

Bibliographic Details
Main Authors: Sergii Babichev, Lyudmyla Yasinska-Damri, Igor Liakh, Bohdan Durnyak
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Symmetry
Subjects:
Online Access:https://www.mdpi.com/2073-8994/13/10/1812
_version_ 1797513133477068800
author Sergii Babichev
Lyudmyla Yasinska-Damri
Igor Liakh
Bohdan Durnyak
author_facet Sergii Babichev
Lyudmyla Yasinska-Damri
Igor Liakh
Bohdan Durnyak
author_sort Sergii Babichev
collection DOAJ
description The problems of gene regulatory network (GRN) reconstruction and the creation of disease diagnostic effective systems based on genes expression data are some of the current directions of modern bioinformatics. In this manuscript, we present the results of the research focused on the evaluation of the effectiveness of the most used metrics to estimate the gene expression profiles’ proximity, which can be used to extract the groups of informative gene expression profiles while taking into account the states of the investigated samples. Symmetry is very important in the field of both genes’ and/or proteins’ interaction since it undergirds essentially all interactions between molecular components in the GRN and extraction of gene expression profiles, which allows us to identify how the investigated biological objects (disease, state of patients, etc.) contribute to the further reconstruction of GRN in terms of both the symmetry and understanding the mechanism of molecular element interaction in a biological organism. Within the framework of our research, we have investigated the following metrics: Mutual information maximization (MIM) using various methods of Shannon entropy calculation, Pearson’s <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi>χ</mi><mn>2</mn></msup></semantics></math></inline-formula> test and correlation distance. The accuracy of the investigated samples classification was used as the main quality criterion to evaluate the appropriate metric effectiveness. The random forest classifier (RF) was used during the simulation process. The research results have shown that results of the use of various methods of Shannon entropy within the framework of the MIM metric disagree with each other. As a result, we have proposed the modified mutual information maximization (MMIM) proximity metric based on the joint use of various methods of Shannon entropy calculation and the Harrington desirability function. The results of the simulation have also shown that the correlation proximity metric is less effective in comparison to both the MMIM metric and Pearson’s <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi>χ</mi><mn>2</mn></msup></semantics></math></inline-formula> test. Finally, we propose the hybrid proximity metric (HPM) that considers both the MMIM metric and Pearson’s <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi>χ</mi><mn>2</mn></msup></semantics></math></inline-formula> test. The proposed metric was investigated within the framework of one-cluster structure effectiveness evaluation. To our mind, the main benefit of the proposed HPM is in increasing the objectivity of mutually similar gene expression profiles extraction due to the joint use of the various effective proximity metrics that can contradict with each other when they are used alone.
first_indexed 2024-03-10T06:10:26Z
format Article
id doaj.art-6858910598be4c57bde23518fb326870
institution Directory Open Access Journal
issn 2073-8994
language English
last_indexed 2024-03-10T06:10:26Z
publishDate 2021-09-01
publisher MDPI AG
record_format Article
series Symmetry
spelling doaj.art-6858910598be4c57bde23518fb3268702023-11-22T20:09:27ZengMDPI AGSymmetry2073-89942021-09-011310181210.3390/sym13101812Comparison Analysis of Gene Expression Profiles Proximity MetricsSergii Babichev0Lyudmyla Yasinska-Damri1Igor Liakh2Bohdan Durnyak3Department of Physics, Kherson State University, 73000 Kherson, UkraineDepartment of Computer Science and Information Technology, Ukrainian Academy of Printing, 79000 Lviv, UkraineDepartment of Informatics, Phisical and Mathematical Disciplines, Uzhhorod National University, 88000 Uzhhorod, UkraineDepartment of Computer Science and Information Technology, Ukrainian Academy of Printing, 79000 Lviv, UkraineThe problems of gene regulatory network (GRN) reconstruction and the creation of disease diagnostic effective systems based on genes expression data are some of the current directions of modern bioinformatics. In this manuscript, we present the results of the research focused on the evaluation of the effectiveness of the most used metrics to estimate the gene expression profiles’ proximity, which can be used to extract the groups of informative gene expression profiles while taking into account the states of the investigated samples. Symmetry is very important in the field of both genes’ and/or proteins’ interaction since it undergirds essentially all interactions between molecular components in the GRN and extraction of gene expression profiles, which allows us to identify how the investigated biological objects (disease, state of patients, etc.) contribute to the further reconstruction of GRN in terms of both the symmetry and understanding the mechanism of molecular element interaction in a biological organism. Within the framework of our research, we have investigated the following metrics: Mutual information maximization (MIM) using various methods of Shannon entropy calculation, Pearson’s <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi>χ</mi><mn>2</mn></msup></semantics></math></inline-formula> test and correlation distance. The accuracy of the investigated samples classification was used as the main quality criterion to evaluate the appropriate metric effectiveness. The random forest classifier (RF) was used during the simulation process. The research results have shown that results of the use of various methods of Shannon entropy within the framework of the MIM metric disagree with each other. As a result, we have proposed the modified mutual information maximization (MMIM) proximity metric based on the joint use of various methods of Shannon entropy calculation and the Harrington desirability function. The results of the simulation have also shown that the correlation proximity metric is less effective in comparison to both the MMIM metric and Pearson’s <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi>χ</mi><mn>2</mn></msup></semantics></math></inline-formula> test. Finally, we propose the hybrid proximity metric (HPM) that considers both the MMIM metric and Pearson’s <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi>χ</mi><mn>2</mn></msup></semantics></math></inline-formula> test. The proposed metric was investigated within the framework of one-cluster structure effectiveness evaluation. To our mind, the main benefit of the proposed HPM is in increasing the objectivity of mutually similar gene expression profiles extraction due to the joint use of the various effective proximity metrics that can contradict with each other when they are used alone.https://www.mdpi.com/2073-8994/13/10/1812symmetry of molecular elements interactionsgene expression profilesmutual information maximization criterioncorrelation distancePearson’s <math display="inline"><semantics><msup><mi>χ</mi><mn>2</mn></msup></semantics></math> testHarrington desirability index
spellingShingle Sergii Babichev
Lyudmyla Yasinska-Damri
Igor Liakh
Bohdan Durnyak
Comparison Analysis of Gene Expression Profiles Proximity Metrics
Symmetry
symmetry of molecular elements interactions
gene expression profiles
mutual information maximization criterion
correlation distance
Pearson’s <math display="inline"><semantics><msup><mi>χ</mi><mn>2</mn></msup></semantics></math> test
Harrington desirability index
title Comparison Analysis of Gene Expression Profiles Proximity Metrics
title_full Comparison Analysis of Gene Expression Profiles Proximity Metrics
title_fullStr Comparison Analysis of Gene Expression Profiles Proximity Metrics
title_full_unstemmed Comparison Analysis of Gene Expression Profiles Proximity Metrics
title_short Comparison Analysis of Gene Expression Profiles Proximity Metrics
title_sort comparison analysis of gene expression profiles proximity metrics
topic symmetry of molecular elements interactions
gene expression profiles
mutual information maximization criterion
correlation distance
Pearson’s <math display="inline"><semantics><msup><mi>χ</mi><mn>2</mn></msup></semantics></math> test
Harrington desirability index
url https://www.mdpi.com/2073-8994/13/10/1812
work_keys_str_mv AT sergiibabichev comparisonanalysisofgeneexpressionprofilesproximitymetrics
AT lyudmylayasinskadamri comparisonanalysisofgeneexpressionprofilesproximitymetrics
AT igorliakh comparisonanalysisofgeneexpressionprofilesproximitymetrics
AT bohdandurnyak comparisonanalysisofgeneexpressionprofilesproximitymetrics