INDEPENDENT COMPONENT ANALYSIS FOR INITIAL APPROXIMATION DETERMINATION IN IDENTIFICATION OF ACTIVE MODULES IN BIOLOGICAL GRAPHS

Subject of Research. The identification of active modules in biological graphs, for example, gene graphs, is one of the important approaches to the interpretation of experimental biological data. One of the approaches for its solution is the application of an algorithm of the joint clustering in net...

Full description

Bibliographic Details
Main Authors: Anastasiia N. Gainullina, Vladimir D. Sukhov, Anatoly A. Shalyto, Alexey A. Sergushichev
Format: Article
Language:English
Published: Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) 2020-12-01
Series:Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
Subjects:
Online Access:https://ntv.ifmo.ru/file/article/20016.pdf
_version_ 1819276479544950784
author Anastasiia N. Gainullina
Vladimir D. Sukhov
Anatoly A. Shalyto
Alexey A. Sergushichev
author_facet Anastasiia N. Gainullina
Vladimir D. Sukhov
Anatoly A. Shalyto
Alexey A. Sergushichev
author_sort Anastasiia N. Gainullina
collection DOAJ
description Subject of Research. The identification of active modules in biological graphs, for example, gene graphs, is one of the important approaches to the interpretation of experimental biological data. One of the approaches for its solution is the application of an algorithm of the joint clustering in network and correlation spaces. The algorithm finds groups of genes that are located simultaneously close in the gene graph and have a high pairwise correlation according to the matrix of gene expression values. The algorithm is iterative and one of its key parameters is the chosen initial approximation, which affects both the run time and the quality of the results. We consider the determination problem of an initial approximation for this algorithm. A procedure based on independent component analysis is proposed for the problem solution. Method. The method of independent component analysis is applied to a centered matrix of expression values at the first step of the proposed procedure for finding of an initial approximation. Then, the genes specific to the component with a given level of statistical significance are identified for each component. The gene groups obtained for all independent components are chosen as the initial approximation. Main Results. The procedure application based on the independent component analysis reduces the number of gene groups in the initial approximation without the loss of accuracy. This fact, in turn, speeds up the running time of the clustering algorithm by an order of magnitude with the quality maintenance of the results. Practical Relevance. Acceleration of the algorithm of the joint clustering in network and correlation spaces without quality loss of the results increases significantly its convenience and simplifies its application for the interpretation of transcriptome data in bioinformatics and computational biology.
first_indexed 2024-12-23T23:40:52Z
format Article
id doaj.art-f3640e265ffd4fd991ca9cd6cf19be9e
institution Directory Open Access Journal
issn 2226-1494
2500-0373
language English
last_indexed 2024-12-23T23:40:52Z
publishDate 2020-12-01
publisher Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
record_format Article
series Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
spelling doaj.art-f3640e265ffd4fd991ca9cd6cf19be9e2022-12-21T17:25:40ZengSaint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki2226-14942500-03732020-12-01206888892https://doi.org/10.17586/2226-1494-2020-20-6-888-892INDEPENDENT COMPONENT ANALYSIS FOR INITIAL APPROXIMATION DETERMINATION IN IDENTIFICATION OF ACTIVE MODULES IN BIOLOGICAL GRAPHSAnastasiia N. Gainullina0https://orcid.org/0000-0003-3796-2337Vladimir D. Sukhov1https://orcid.org/0000-0002-5169-1433Anatoly A. Shalyto2https://orcid.org/0000-0002-2723-2077Alexey A. Sergushichev3https://orcid.org/0000-0003-1159-7220Software Developer, ITMO University, Saint Petersburg, 197101, Russian FederationSoftware Developer, ITMO University, Saint Petersburg, 197101, Russian FederationD.Sc., Professor, Chief Researcher, ITMO University, Saint Petersburg, 197101, Russian FederationPhD, Associate Professor, Associate Professor, ITMO University, Saint Petersburg, 197101, Russian FederationSubject of Research. The identification of active modules in biological graphs, for example, gene graphs, is one of the important approaches to the interpretation of experimental biological data. One of the approaches for its solution is the application of an algorithm of the joint clustering in network and correlation spaces. The algorithm finds groups of genes that are located simultaneously close in the gene graph and have a high pairwise correlation according to the matrix of gene expression values. The algorithm is iterative and one of its key parameters is the chosen initial approximation, which affects both the run time and the quality of the results. We consider the determination problem of an initial approximation for this algorithm. A procedure based on independent component analysis is proposed for the problem solution. Method. The method of independent component analysis is applied to a centered matrix of expression values at the first step of the proposed procedure for finding of an initial approximation. Then, the genes specific to the component with a given level of statistical significance are identified for each component. The gene groups obtained for all independent components are chosen as the initial approximation. Main Results. The procedure application based on the independent component analysis reduces the number of gene groups in the initial approximation without the loss of accuracy. This fact, in turn, speeds up the running time of the clustering algorithm by an order of magnitude with the quality maintenance of the results. Practical Relevance. Acceleration of the algorithm of the joint clustering in network and correlation spaces without quality loss of the results increases significantly its convenience and simplifies its application for the interpretation of transcriptome data in bioinformatics and computational biology.https://ntv.ifmo.ru/file/article/20016.pdfclusteringcorrelationindependent component analysisgraphsgene expression
spellingShingle Anastasiia N. Gainullina
Vladimir D. Sukhov
Anatoly A. Shalyto
Alexey A. Sergushichev
INDEPENDENT COMPONENT ANALYSIS FOR INITIAL APPROXIMATION DETERMINATION IN IDENTIFICATION OF ACTIVE MODULES IN BIOLOGICAL GRAPHS
Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
clustering
correlation
independent component analysis
graphs
gene expression
title INDEPENDENT COMPONENT ANALYSIS FOR INITIAL APPROXIMATION DETERMINATION IN IDENTIFICATION OF ACTIVE MODULES IN BIOLOGICAL GRAPHS
title_full INDEPENDENT COMPONENT ANALYSIS FOR INITIAL APPROXIMATION DETERMINATION IN IDENTIFICATION OF ACTIVE MODULES IN BIOLOGICAL GRAPHS
title_fullStr INDEPENDENT COMPONENT ANALYSIS FOR INITIAL APPROXIMATION DETERMINATION IN IDENTIFICATION OF ACTIVE MODULES IN BIOLOGICAL GRAPHS
title_full_unstemmed INDEPENDENT COMPONENT ANALYSIS FOR INITIAL APPROXIMATION DETERMINATION IN IDENTIFICATION OF ACTIVE MODULES IN BIOLOGICAL GRAPHS
title_short INDEPENDENT COMPONENT ANALYSIS FOR INITIAL APPROXIMATION DETERMINATION IN IDENTIFICATION OF ACTIVE MODULES IN BIOLOGICAL GRAPHS
title_sort independent component analysis for initial approximation determination in identification of active modules in biological graphs
topic clustering
correlation
independent component analysis
graphs
gene expression
url https://ntv.ifmo.ru/file/article/20016.pdf
work_keys_str_mv AT anastasiiangainullina independentcomponentanalysisforinitialapproximationdeterminationinidentificationofactivemodulesinbiologicalgraphs
AT vladimirdsukhov independentcomponentanalysisforinitialapproximationdeterminationinidentificationofactivemodulesinbiologicalgraphs
AT anatolyashalyto independentcomponentanalysisforinitialapproximationdeterminationinidentificationofactivemodulesinbiologicalgraphs
AT alexeyasergushichev independentcomponentanalysisforinitialapproximationdeterminationinidentificationofactivemodulesinbiologicalgraphs