A comprehensive system for evaluation of remote sequence similarity detection

<p>Abstract</p> <p>Background</p> <p>Accurate and sensitive performance evaluation is crucial for both effective development of better structure prediction methods based on sequence similarity, and for the comparative analysis of existing methods. Up to date, there has...

Full description

Bibliographic Details
Main Authors: Kim Bong-Hyun, Wang Yong, Sadreyev Ruslan I, Qi Yuan, Grishin Nick V
Format: Article
Language:English
Published: BMC 2007-08-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/8/314
_version_ 1818448056977195008
author Kim Bong-Hyun
Wang Yong
Sadreyev Ruslan I
Qi Yuan
Grishin Nick V
author_facet Kim Bong-Hyun
Wang Yong
Sadreyev Ruslan I
Qi Yuan
Grishin Nick V
author_sort Kim Bong-Hyun
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Accurate and sensitive performance evaluation is crucial for both effective development of better structure prediction methods based on sequence similarity, and for the comparative analysis of existing methods. Up to date, there has been no satisfactory comprehensive evaluation method that (i) is based on a large and statistically unbiased set of proteins with clearly defined relationships; and (ii) covers all performance aspects of sequence-based structure predictors, such as sensitivity and specificity, alignment accuracy and coverage, and structure template quality.</p> <p>Results</p> <p>With the aim of designing such a method, we (i) select a statistically balanced set of divergent protein domains from SCOP, and define similarity relationships for the majority of these domains by complementing the best of information available in SCOP with a rigorous SVM-based algorithm; and (ii) develop protocols for the assessment of similarity detection and alignment quality from several complementary perspectives. The evaluation of similarity detection is based on ROC-like curves and includes several complementary approaches to the definition of true/false positives. Reference-dependent approaches use the 'gold standard' of pre-defined domain relationships and structure-based alignments. Reference-independent approaches assess the quality of structural match predicted by the sequence alignment, with respect to the whole domain length (global mode) or to the aligned region only (local mode). Similarly, the evaluation of alignment quality includes several reference-dependent and -independent measures, in global and local modes. As an illustration, we use our benchmark to compare the performance of several methods for the detection of remote sequence similarities, and show that different aspects of evaluation reveal different properties of the evaluated methods, highlighting their advantages, weaknesses, and potential for further development.</p> <p>Conclusion</p> <p>The presented benchmark provides a new tool for a statistically unbiased assessment of methods for remote sequence similarity detection, from various complementary perspectives. This tool should be useful both for users choosing the best method for a given purpose, and for developers designing new, more powerful methods. The benchmark set, reference alignments, and evaluation codes can be downloaded from <url>ftp://iole.swmed.edu/pub/evaluation/</url>.</p>
first_indexed 2024-12-14T20:13:27Z
format Article
id doaj.art-a7f5112355a74d2184bf3ea619122ac8
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-14T20:13:27Z
publishDate 2007-08-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-a7f5112355a74d2184bf3ea619122ac82022-12-21T22:48:54ZengBMCBMC Bioinformatics1471-21052007-08-018131410.1186/1471-2105-8-314A comprehensive system for evaluation of remote sequence similarity detectionKim Bong-HyunWang YongSadreyev Ruslan IQi YuanGrishin Nick V<p>Abstract</p> <p>Background</p> <p>Accurate and sensitive performance evaluation is crucial for both effective development of better structure prediction methods based on sequence similarity, and for the comparative analysis of existing methods. Up to date, there has been no satisfactory comprehensive evaluation method that (i) is based on a large and statistically unbiased set of proteins with clearly defined relationships; and (ii) covers all performance aspects of sequence-based structure predictors, such as sensitivity and specificity, alignment accuracy and coverage, and structure template quality.</p> <p>Results</p> <p>With the aim of designing such a method, we (i) select a statistically balanced set of divergent protein domains from SCOP, and define similarity relationships for the majority of these domains by complementing the best of information available in SCOP with a rigorous SVM-based algorithm; and (ii) develop protocols for the assessment of similarity detection and alignment quality from several complementary perspectives. The evaluation of similarity detection is based on ROC-like curves and includes several complementary approaches to the definition of true/false positives. Reference-dependent approaches use the 'gold standard' of pre-defined domain relationships and structure-based alignments. Reference-independent approaches assess the quality of structural match predicted by the sequence alignment, with respect to the whole domain length (global mode) or to the aligned region only (local mode). Similarly, the evaluation of alignment quality includes several reference-dependent and -independent measures, in global and local modes. As an illustration, we use our benchmark to compare the performance of several methods for the detection of remote sequence similarities, and show that different aspects of evaluation reveal different properties of the evaluated methods, highlighting their advantages, weaknesses, and potential for further development.</p> <p>Conclusion</p> <p>The presented benchmark provides a new tool for a statistically unbiased assessment of methods for remote sequence similarity detection, from various complementary perspectives. This tool should be useful both for users choosing the best method for a given purpose, and for developers designing new, more powerful methods. The benchmark set, reference alignments, and evaluation codes can be downloaded from <url>ftp://iole.swmed.edu/pub/evaluation/</url>.</p>http://www.biomedcentral.com/1471-2105/8/314
spellingShingle Kim Bong-Hyun
Wang Yong
Sadreyev Ruslan I
Qi Yuan
Grishin Nick V
A comprehensive system for evaluation of remote sequence similarity detection
BMC Bioinformatics
title A comprehensive system for evaluation of remote sequence similarity detection
title_full A comprehensive system for evaluation of remote sequence similarity detection
title_fullStr A comprehensive system for evaluation of remote sequence similarity detection
title_full_unstemmed A comprehensive system for evaluation of remote sequence similarity detection
title_short A comprehensive system for evaluation of remote sequence similarity detection
title_sort comprehensive system for evaluation of remote sequence similarity detection
url http://www.biomedcentral.com/1471-2105/8/314
work_keys_str_mv AT kimbonghyun acomprehensivesystemforevaluationofremotesequencesimilaritydetection
AT wangyong acomprehensivesystemforevaluationofremotesequencesimilaritydetection
AT sadreyevruslani acomprehensivesystemforevaluationofremotesequencesimilaritydetection
AT qiyuan acomprehensivesystemforevaluationofremotesequencesimilaritydetection
AT grishinnickv acomprehensivesystemforevaluationofremotesequencesimilaritydetection
AT kimbonghyun comprehensivesystemforevaluationofremotesequencesimilaritydetection
AT wangyong comprehensivesystemforevaluationofremotesequencesimilaritydetection
AT sadreyevruslani comprehensivesystemforevaluationofremotesequencesimilaritydetection
AT qiyuan comprehensivesystemforevaluationofremotesequencesimilaritydetection
AT grishinnickv comprehensivesystemforevaluationofremotesequencesimilaritydetection