Filtering Degenerate Patterns with Application to Protein Sequence Analysis

In biology, the notion of degenerate pattern plays a central role for describing various phenomena. For example, protein active site patterns, like those contained in the PROSITE database, e.g., [FY ]DPC[LIM][ASG]C[ASG], are, in general, represented by degenerate patterns with character classes. Res...

Full description

Bibliographic Details
Main Authors: Matteo Comin, Davide Verzotto
Format: Article
Language:English
Published: MDPI AG 2013-05-01
Series:Algorithms
Subjects:
Online Access:http://www.mdpi.com/1999-4893/6/2/352
_version_ 1818907719037353984
author Matteo Comin
Davide Verzotto
author_facet Matteo Comin
Davide Verzotto
author_sort Matteo Comin
collection DOAJ
description In biology, the notion of degenerate pattern plays a central role for describing various phenomena. For example, protein active site patterns, like those contained in the PROSITE database, e.g., [FY ]DPC[LIM][ASG]C[ASG], are, in general, represented by degenerate patterns with character classes. Researchers have developed several approaches over the years to discover degenerate patterns. Although these methods have been exhaustively and successfully tested on genomes and proteins, their outcomes often far exceed the size of the original input, making the output hard to be managed and to be interpreted by refined analysis requiring manual inspection. In this paper, we discuss a characterization of degenerate patterns with character classes, without gaps, and we introduce the concept of pattern priority for comparing and ranking different patterns. We define the class of underlying patterns for filtering any set of degenerate patterns into a new set that is linear in the size of the input sequence. We present some preliminary results on the detection of subtle signals in protein families. Results show that our approach drastically reduces the number of patterns in output for a tool for protein analysis, while retaining the representative patterns.
first_indexed 2024-12-19T21:59:35Z
format Article
id doaj.art-69b3fc78e9f0429daca5c0b4107c4056
institution Directory Open Access Journal
issn 1999-4893
language English
last_indexed 2024-12-19T21:59:35Z
publishDate 2013-05-01
publisher MDPI AG
record_format Article
series Algorithms
spelling doaj.art-69b3fc78e9f0429daca5c0b4107c40562022-12-21T20:04:11ZengMDPI AGAlgorithms1999-48932013-05-016235237010.3390/a6020352Filtering Degenerate Patterns with Application to Protein Sequence AnalysisMatteo CominDavide VerzottoIn biology, the notion of degenerate pattern plays a central role for describing various phenomena. For example, protein active site patterns, like those contained in the PROSITE database, e.g., [FY ]DPC[LIM][ASG]C[ASG], are, in general, represented by degenerate patterns with character classes. Researchers have developed several approaches over the years to discover degenerate patterns. Although these methods have been exhaustively and successfully tested on genomes and proteins, their outcomes often far exceed the size of the original input, making the output hard to be managed and to be interpreted by refined analysis requiring manual inspection. In this paper, we discuss a characterization of degenerate patterns with character classes, without gaps, and we introduce the concept of pattern priority for comparing and ranking different patterns. We define the class of underlying patterns for filtering any set of degenerate patterns into a new set that is linear in the size of the input sequence. We present some preliminary results on the detection of subtle signals in protein families. Results show that our approach drastically reduces the number of patterns in output for a tool for protein analysis, while retaining the representative patterns.http://www.mdpi.com/1999-4893/6/2/352pattern discovery and filteringdegenerate patternsanalysis of biological data
spellingShingle Matteo Comin
Davide Verzotto
Filtering Degenerate Patterns with Application to Protein Sequence Analysis
Algorithms
pattern discovery and filtering
degenerate patterns
analysis of biological data
title Filtering Degenerate Patterns with Application to Protein Sequence Analysis
title_full Filtering Degenerate Patterns with Application to Protein Sequence Analysis
title_fullStr Filtering Degenerate Patterns with Application to Protein Sequence Analysis
title_full_unstemmed Filtering Degenerate Patterns with Application to Protein Sequence Analysis
title_short Filtering Degenerate Patterns with Application to Protein Sequence Analysis
title_sort filtering degenerate patterns with application to protein sequence analysis
topic pattern discovery and filtering
degenerate patterns
analysis of biological data
url http://www.mdpi.com/1999-4893/6/2/352
work_keys_str_mv AT matteocomin filteringdegeneratepatternswithapplicationtoproteinsequenceanalysis
AT davideverzotto filteringdegeneratepatternswithapplicationtoproteinsequenceanalysis