Filtering Degenerate Patterns with Application to Protein Sequence Analysis
In biology, the notion of degenerate pattern plays a central role for describing various phenomena. For example, protein active site patterns, like those contained in the PROSITE database, e.g., [FY ]DPC[LIM][ASG]C[ASG], are, in general, represented by degenerate patterns with character classes. Res...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2013-05-01
|
Series: | Algorithms |
Subjects: | |
Online Access: | http://www.mdpi.com/1999-4893/6/2/352 |
_version_ | 1818907719037353984 |
---|---|
author | Matteo Comin Davide Verzotto |
author_facet | Matteo Comin Davide Verzotto |
author_sort | Matteo Comin |
collection | DOAJ |
description | In biology, the notion of degenerate pattern plays a central role for describing various phenomena. For example, protein active site patterns, like those contained in the PROSITE database, e.g., [FY ]DPC[LIM][ASG]C[ASG], are, in general, represented by degenerate patterns with character classes. Researchers have developed several approaches over the years to discover degenerate patterns. Although these methods have been exhaustively and successfully tested on genomes and proteins, their outcomes often far exceed the size of the original input, making the output hard to be managed and to be interpreted by refined analysis requiring manual inspection. In this paper, we discuss a characterization of degenerate patterns with character classes, without gaps, and we introduce the concept of pattern priority for comparing and ranking different patterns. We define the class of underlying patterns for filtering any set of degenerate patterns into a new set that is linear in the size of the input sequence. We present some preliminary results on the detection of subtle signals in protein families. Results show that our approach drastically reduces the number of patterns in output for a tool for protein analysis, while retaining the representative patterns. |
first_indexed | 2024-12-19T21:59:35Z |
format | Article |
id | doaj.art-69b3fc78e9f0429daca5c0b4107c4056 |
institution | Directory Open Access Journal |
issn | 1999-4893 |
language | English |
last_indexed | 2024-12-19T21:59:35Z |
publishDate | 2013-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Algorithms |
spelling | doaj.art-69b3fc78e9f0429daca5c0b4107c40562022-12-21T20:04:11ZengMDPI AGAlgorithms1999-48932013-05-016235237010.3390/a6020352Filtering Degenerate Patterns with Application to Protein Sequence AnalysisMatteo CominDavide VerzottoIn biology, the notion of degenerate pattern plays a central role for describing various phenomena. For example, protein active site patterns, like those contained in the PROSITE database, e.g., [FY ]DPC[LIM][ASG]C[ASG], are, in general, represented by degenerate patterns with character classes. Researchers have developed several approaches over the years to discover degenerate patterns. Although these methods have been exhaustively and successfully tested on genomes and proteins, their outcomes often far exceed the size of the original input, making the output hard to be managed and to be interpreted by refined analysis requiring manual inspection. In this paper, we discuss a characterization of degenerate patterns with character classes, without gaps, and we introduce the concept of pattern priority for comparing and ranking different patterns. We define the class of underlying patterns for filtering any set of degenerate patterns into a new set that is linear in the size of the input sequence. We present some preliminary results on the detection of subtle signals in protein families. Results show that our approach drastically reduces the number of patterns in output for a tool for protein analysis, while retaining the representative patterns.http://www.mdpi.com/1999-4893/6/2/352pattern discovery and filteringdegenerate patternsanalysis of biological data |
spellingShingle | Matteo Comin Davide Verzotto Filtering Degenerate Patterns with Application to Protein Sequence Analysis Algorithms pattern discovery and filtering degenerate patterns analysis of biological data |
title | Filtering Degenerate Patterns with Application to Protein Sequence Analysis |
title_full | Filtering Degenerate Patterns with Application to Protein Sequence Analysis |
title_fullStr | Filtering Degenerate Patterns with Application to Protein Sequence Analysis |
title_full_unstemmed | Filtering Degenerate Patterns with Application to Protein Sequence Analysis |
title_short | Filtering Degenerate Patterns with Application to Protein Sequence Analysis |
title_sort | filtering degenerate patterns with application to protein sequence analysis |
topic | pattern discovery and filtering degenerate patterns analysis of biological data |
url | http://www.mdpi.com/1999-4893/6/2/352 |
work_keys_str_mv | AT matteocomin filteringdegeneratepatternswithapplicationtoproteinsequenceanalysis AT davideverzotto filteringdegeneratepatternswithapplicationtoproteinsequenceanalysis |