Metrics for GO based protein semantic similarity: a systematic evaluation

<p>Abstract</p> <p>Background</p> <p>Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic simila...

Full description

Bibliographic Details
Main Authors: Falcão André O, Ferreira António EN, Bastos Hugo, Faria Daniel, Pesquita Catia, Couto Francisco M
Format: Article
Language:English
Published: BMC 2008-04-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/S5/S4
_version_ 1818777247564169216
author Falcão André O
Ferreira António EN
Bastos Hugo
Faria Daniel
Pesquita Catia
Couto Francisco M
author_facet Falcão André O
Ferreira António EN
Bastos Hugo
Faria Daniel
Pesquita Catia
Couto Francisco M
author_sort Falcão André O
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations.</p> <p>Results</p> <p>We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation.</p> <p>Conclusions</p> <p>This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid <it>simGIC</it> was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.</p>
first_indexed 2024-12-18T11:25:48Z
format Article
id doaj.art-baad1e199dd741d082ea95ffb818d15e
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-18T11:25:48Z
publishDate 2008-04-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-baad1e199dd741d082ea95ffb818d15e2022-12-21T21:09:42ZengBMCBMC Bioinformatics1471-21052008-04-019Suppl 5S410.1186/1471-2105-9-S5-S4Metrics for GO based protein semantic similarity: a systematic evaluationFalcão André OFerreira António ENBastos HugoFaria DanielPesquita CatiaCouto Francisco M<p>Abstract</p> <p>Background</p> <p>Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations.</p> <p>Results</p> <p>We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation.</p> <p>Conclusions</p> <p>This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid <it>simGIC</it> was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.</p>http://www.biomedcentral.com/1471-2105/9/S5/S4
spellingShingle Falcão André O
Ferreira António EN
Bastos Hugo
Faria Daniel
Pesquita Catia
Couto Francisco M
Metrics for GO based protein semantic similarity: a systematic evaluation
BMC Bioinformatics
title Metrics for GO based protein semantic similarity: a systematic evaluation
title_full Metrics for GO based protein semantic similarity: a systematic evaluation
title_fullStr Metrics for GO based protein semantic similarity: a systematic evaluation
title_full_unstemmed Metrics for GO based protein semantic similarity: a systematic evaluation
title_short Metrics for GO based protein semantic similarity: a systematic evaluation
title_sort metrics for go based protein semantic similarity a systematic evaluation
url http://www.biomedcentral.com/1471-2105/9/S5/S4
work_keys_str_mv AT falcaoandreo metricsforgobasedproteinsemanticsimilarityasystematicevaluation
AT ferreiraantonioen metricsforgobasedproteinsemanticsimilarityasystematicevaluation
AT bastoshugo metricsforgobasedproteinsemanticsimilarityasystematicevaluation
AT fariadaniel metricsforgobasedproteinsemanticsimilarityasystematicevaluation
AT pesquitacatia metricsforgobasedproteinsemanticsimilarityasystematicevaluation
AT coutofranciscom metricsforgobasedproteinsemanticsimilarityasystematicevaluation