MLgsc: A Maximum-Likelihood General Sequence Classifier.
We present software package for classifying protein or nucleotide sequences to user-specified sets of reference sequences. The software trains a model using a multiple sequence alignment and a phylogenetic tree, both supplied by the user. The latter is used to guide model construction and as a decis...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2015-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0129384 |
_version_ | 1818584224319406080 |
---|---|
author | Thomas Junier Vincent Hervé Tina Wunderlin Pilar Junier |
author_facet | Thomas Junier Vincent Hervé Tina Wunderlin Pilar Junier |
author_sort | Thomas Junier |
collection | DOAJ |
description | We present software package for classifying protein or nucleotide sequences to user-specified sets of reference sequences. The software trains a model using a multiple sequence alignment and a phylogenetic tree, both supplied by the user. The latter is used to guide model construction and as a decision tree to speed up the classification process. The software was evaluated on all the 16S rRNA gene sequences of the reference dataset found in the GreenGenes database. On this dataset, the software was shown to achieve an error rate of around 1% at genus level. Examples of applications based on the nitrogenase subunit NifH gene and a protein-coding gene found in endospore-forming Firmicutes is also presented. The programs in the package have a simple, straightforward command-line interface for the Unix shell, and are free and open-source. The package has minimal dependencies and thus can be easily integrated in command-line based classification pipelines. |
first_indexed | 2024-12-16T08:17:46Z |
format | Article |
id | doaj.art-67e72276d164422981a639e0d759c3c8 |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-12-16T08:17:46Z |
publishDate | 2015-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-67e72276d164422981a639e0d759c3c82022-12-21T22:38:11ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01107e012938410.1371/journal.pone.0129384MLgsc: A Maximum-Likelihood General Sequence Classifier.Thomas JunierVincent HervéTina WunderlinPilar JunierWe present software package for classifying protein or nucleotide sequences to user-specified sets of reference sequences. The software trains a model using a multiple sequence alignment and a phylogenetic tree, both supplied by the user. The latter is used to guide model construction and as a decision tree to speed up the classification process. The software was evaluated on all the 16S rRNA gene sequences of the reference dataset found in the GreenGenes database. On this dataset, the software was shown to achieve an error rate of around 1% at genus level. Examples of applications based on the nitrogenase subunit NifH gene and a protein-coding gene found in endospore-forming Firmicutes is also presented. The programs in the package have a simple, straightforward command-line interface for the Unix shell, and are free and open-source. The package has minimal dependencies and thus can be easily integrated in command-line based classification pipelines.https://doi.org/10.1371/journal.pone.0129384 |
spellingShingle | Thomas Junier Vincent Hervé Tina Wunderlin Pilar Junier MLgsc: A Maximum-Likelihood General Sequence Classifier. PLoS ONE |
title | MLgsc: A Maximum-Likelihood General Sequence Classifier. |
title_full | MLgsc: A Maximum-Likelihood General Sequence Classifier. |
title_fullStr | MLgsc: A Maximum-Likelihood General Sequence Classifier. |
title_full_unstemmed | MLgsc: A Maximum-Likelihood General Sequence Classifier. |
title_short | MLgsc: A Maximum-Likelihood General Sequence Classifier. |
title_sort | mlgsc a maximum likelihood general sequence classifier |
url | https://doi.org/10.1371/journal.pone.0129384 |
work_keys_str_mv | AT thomasjunier mlgscamaximumlikelihoodgeneralsequenceclassifier AT vincentherve mlgscamaximumlikelihoodgeneralsequenceclassifier AT tinawunderlin mlgscamaximumlikelihoodgeneralsequenceclassifier AT pilarjunier mlgscamaximumlikelihoodgeneralsequenceclassifier |