Assessment of weighted topological overlap (wTO) to improve fidelity of gene co-expression networks
Abstract Background For more than a decade, gene expression data sets have been used as basis for the construction of co-expression networks used in systems biology investigations, leading to many important discoveries in a wide range of subjects spanning human disease to evolution and the developme...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-01-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-019-2596-9 |
_version_ | 1818306258665472000 |
---|---|
author | André Voigt Eivind Almaas |
author_facet | André Voigt Eivind Almaas |
author_sort | André Voigt |
collection | DOAJ |
description | Abstract Background For more than a decade, gene expression data sets have been used as basis for the construction of co-expression networks used in systems biology investigations, leading to many important discoveries in a wide range of subjects spanning human disease to evolution and the development of organisms. A commonly encountered challenge in such investigations is first that of detecting, then subsequently removing, spurious correlations (i.e. links) in these networks. While access to a large number of measurements per gene would reduce this problem, often only a small number of measurements are available. The weighted Topological Overlap (wTO) measure, which incorporates information from the shared network-neighborhood of a given gene-pair into a single score, is a metric that is frequently used with the implicit expectation of producing higher-quality networks. However, the actual extent to which wTO improves on the accuracy of a co-expression analysis has not been quantified. Results Here, we used a large-sample biological data set containing 338 gene-expression measurements per gene as a reference system. From these data, we generated ensembles consisting of 10, 20 and 50 randomly selected measurements to emulate low-quality data sets, finding that the wTO measure consistently generates more robust scores than what results from simple correlation calculations. Furthermore, for the data sets consisting of only 10 and 20 samples per gene, we find that wTO serves as a better predictor of the correlation scores generated from the full data set. However, we find that using wTO as a score for network building substantially alters several topographical aspects of the resulting networks, with no conclusive evidence that the resulting structure is more accurate. Importantly, we find that the much used approach of applying a soft-threshold modifier to link weights prior to computing the wTO substantially decreases the robustness of the resulting wTO network, but increases the predictive power of wTO networks with regards to the reference correlation (soft threshold) network, particularly as the size of the data sets increases. Conclusion Our analysis demonstrates that, in agreement with previous assumptions, the wTO approach is capable of significantly improving the fidelity of co-expression networks, and that this effect is especially evident for cases of low-sample number gene-expression data sets. |
first_indexed | 2024-12-13T06:39:38Z |
format | Article |
id | doaj.art-4aba1eec232c47c1b91db452920acc7b |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-13T06:39:38Z |
publishDate | 2019-01-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-4aba1eec232c47c1b91db452920acc7b2022-12-21T23:56:27ZengBMCBMC Bioinformatics1471-21052019-01-0120111110.1186/s12859-019-2596-9Assessment of weighted topological overlap (wTO) to improve fidelity of gene co-expression networksAndré Voigt0Eivind Almaas1Network Systems Biology Group, Department of Biotechnology, NTNU - Norwegian University of Science and TechnologyNetwork Systems Biology Group, Department of Biotechnology, NTNU - Norwegian University of Science and TechnologyAbstract Background For more than a decade, gene expression data sets have been used as basis for the construction of co-expression networks used in systems biology investigations, leading to many important discoveries in a wide range of subjects spanning human disease to evolution and the development of organisms. A commonly encountered challenge in such investigations is first that of detecting, then subsequently removing, spurious correlations (i.e. links) in these networks. While access to a large number of measurements per gene would reduce this problem, often only a small number of measurements are available. The weighted Topological Overlap (wTO) measure, which incorporates information from the shared network-neighborhood of a given gene-pair into a single score, is a metric that is frequently used with the implicit expectation of producing higher-quality networks. However, the actual extent to which wTO improves on the accuracy of a co-expression analysis has not been quantified. Results Here, we used a large-sample biological data set containing 338 gene-expression measurements per gene as a reference system. From these data, we generated ensembles consisting of 10, 20 and 50 randomly selected measurements to emulate low-quality data sets, finding that the wTO measure consistently generates more robust scores than what results from simple correlation calculations. Furthermore, for the data sets consisting of only 10 and 20 samples per gene, we find that wTO serves as a better predictor of the correlation scores generated from the full data set. However, we find that using wTO as a score for network building substantially alters several topographical aspects of the resulting networks, with no conclusive evidence that the resulting structure is more accurate. Importantly, we find that the much used approach of applying a soft-threshold modifier to link weights prior to computing the wTO substantially decreases the robustness of the resulting wTO network, but increases the predictive power of wTO networks with regards to the reference correlation (soft threshold) network, particularly as the size of the data sets increases. Conclusion Our analysis demonstrates that, in agreement with previous assumptions, the wTO approach is capable of significantly improving the fidelity of co-expression networks, and that this effect is especially evident for cases of low-sample number gene-expression data sets.http://link.springer.com/article/10.1186/s12859-019-2596-9Gene co-expression networkWeighted topological overlapCorrelation networkBiological network analysis |
spellingShingle | André Voigt Eivind Almaas Assessment of weighted topological overlap (wTO) to improve fidelity of gene co-expression networks BMC Bioinformatics Gene co-expression network Weighted topological overlap Correlation network Biological network analysis |
title | Assessment of weighted topological overlap (wTO) to improve fidelity of gene co-expression networks |
title_full | Assessment of weighted topological overlap (wTO) to improve fidelity of gene co-expression networks |
title_fullStr | Assessment of weighted topological overlap (wTO) to improve fidelity of gene co-expression networks |
title_full_unstemmed | Assessment of weighted topological overlap (wTO) to improve fidelity of gene co-expression networks |
title_short | Assessment of weighted topological overlap (wTO) to improve fidelity of gene co-expression networks |
title_sort | assessment of weighted topological overlap wto to improve fidelity of gene co expression networks |
topic | Gene co-expression network Weighted topological overlap Correlation network Biological network analysis |
url | http://link.springer.com/article/10.1186/s12859-019-2596-9 |
work_keys_str_mv | AT andrevoigt assessmentofweightedtopologicaloverlapwtotoimprovefidelityofgenecoexpressionnetworks AT eivindalmaas assessmentofweightedtopologicaloverlapwtotoimprovefidelityofgenecoexpressionnetworks |