Quantiprot - a Python package for quantitative analysis of protein sequences

Abstract Background The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensio...

Full description

Bibliographic Details
Main Authors: Bogumił M. Konopka, Marta Marciniak, Witold Dyrka
Format: Article
Language:English
Published: BMC 2017-07-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-017-1751-4
_version_ 1811322527514886144
author Bogumił M. Konopka
Marta Marciniak
Witold Dyrka
author_facet Bogumił M. Konopka
Marta Marciniak
Witold Dyrka
author_sort Bogumił M. Konopka
collection DOAJ
description Abstract Background The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Results Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf’s law coefficient. Conclusions We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.
first_indexed 2024-04-13T13:36:38Z
format Article
id doaj.art-28deb87f318d41669d3292eb7338e004
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T13:36:38Z
publishDate 2017-07-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-28deb87f318d41669d3292eb7338e0042022-12-22T02:44:45ZengBMCBMC Bioinformatics1471-21052017-07-011811610.1186/s12859-017-1751-4Quantiprot - a Python package for quantitative analysis of protein sequencesBogumił M. Konopka0Marta Marciniak1Witold Dyrka2Katedra InŻynierii Biomedycznej, Wydział Podstawowych Problemów Techniki, Politechnika WrocławskaKatedra InŻynierii Biomedycznej, Wydział Podstawowych Problemów Techniki, Politechnika WrocławskaKatedra InŻynierii Biomedycznej, Wydział Podstawowych Problemów Techniki, Politechnika WrocławskaAbstract Background The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Results Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf’s law coefficient. Conclusions We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.http://link.springer.com/article/10.1186/s12859-017-1751-4Protein sequence analysisPython packageQuantitative propertiesQuantitative recurrence analysisn-grams
spellingShingle Bogumił M. Konopka
Marta Marciniak
Witold Dyrka
Quantiprot - a Python package for quantitative analysis of protein sequences
BMC Bioinformatics
Protein sequence analysis
Python package
Quantitative properties
Quantitative recurrence analysis
n-grams
title Quantiprot - a Python package for quantitative analysis of protein sequences
title_full Quantiprot - a Python package for quantitative analysis of protein sequences
title_fullStr Quantiprot - a Python package for quantitative analysis of protein sequences
title_full_unstemmed Quantiprot - a Python package for quantitative analysis of protein sequences
title_short Quantiprot - a Python package for quantitative analysis of protein sequences
title_sort quantiprot a python package for quantitative analysis of protein sequences
topic Protein sequence analysis
Python package
Quantitative properties
Quantitative recurrence analysis
n-grams
url http://link.springer.com/article/10.1186/s12859-017-1751-4
work_keys_str_mv AT bogumiłmkonopka quantiprotapythonpackageforquantitativeanalysisofproteinsequences
AT martamarciniak quantiprotapythonpackageforquantitativeanalysisofproteinsequences
AT witolddyrka quantiprotapythonpackageforquantitativeanalysisofproteinsequences