MLgsc: A Maximum-Likelihood General Sequence Classifier.

We present software package for classifying protein or nucleotide sequences to user-specified sets of reference sequences. The software trains a model using a multiple sequence alignment and a phylogenetic tree, both supplied by the user. The latter is used to guide model construction and as a decis...

Full description

Bibliographic Details
Main Authors: Thomas Junier, Vincent Hervé, Tina Wunderlin, Pilar Junier
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2015-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0129384
_version_ 1818584224319406080
author Thomas Junier
Vincent Hervé
Tina Wunderlin
Pilar Junier
author_facet Thomas Junier
Vincent Hervé
Tina Wunderlin
Pilar Junier
author_sort Thomas Junier
collection DOAJ
description We present software package for classifying protein or nucleotide sequences to user-specified sets of reference sequences. The software trains a model using a multiple sequence alignment and a phylogenetic tree, both supplied by the user. The latter is used to guide model construction and as a decision tree to speed up the classification process. The software was evaluated on all the 16S rRNA gene sequences of the reference dataset found in the GreenGenes database. On this dataset, the software was shown to achieve an error rate of around 1% at genus level. Examples of applications based on the nitrogenase subunit NifH gene and a protein-coding gene found in endospore-forming Firmicutes is also presented. The programs in the package have a simple, straightforward command-line interface for the Unix shell, and are free and open-source. The package has minimal dependencies and thus can be easily integrated in command-line based classification pipelines.
first_indexed 2024-12-16T08:17:46Z
format Article
id doaj.art-67e72276d164422981a639e0d759c3c8
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-16T08:17:46Z
publishDate 2015-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-67e72276d164422981a639e0d759c3c82022-12-21T22:38:11ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01107e012938410.1371/journal.pone.0129384MLgsc: A Maximum-Likelihood General Sequence Classifier.Thomas JunierVincent HervéTina WunderlinPilar JunierWe present software package for classifying protein or nucleotide sequences to user-specified sets of reference sequences. The software trains a model using a multiple sequence alignment and a phylogenetic tree, both supplied by the user. The latter is used to guide model construction and as a decision tree to speed up the classification process. The software was evaluated on all the 16S rRNA gene sequences of the reference dataset found in the GreenGenes database. On this dataset, the software was shown to achieve an error rate of around 1% at genus level. Examples of applications based on the nitrogenase subunit NifH gene and a protein-coding gene found in endospore-forming Firmicutes is also presented. The programs in the package have a simple, straightforward command-line interface for the Unix shell, and are free and open-source. The package has minimal dependencies and thus can be easily integrated in command-line based classification pipelines.https://doi.org/10.1371/journal.pone.0129384
spellingShingle Thomas Junier
Vincent Hervé
Tina Wunderlin
Pilar Junier
MLgsc: A Maximum-Likelihood General Sequence Classifier.
PLoS ONE
title MLgsc: A Maximum-Likelihood General Sequence Classifier.
title_full MLgsc: A Maximum-Likelihood General Sequence Classifier.
title_fullStr MLgsc: A Maximum-Likelihood General Sequence Classifier.
title_full_unstemmed MLgsc: A Maximum-Likelihood General Sequence Classifier.
title_short MLgsc: A Maximum-Likelihood General Sequence Classifier.
title_sort mlgsc a maximum likelihood general sequence classifier
url https://doi.org/10.1371/journal.pone.0129384
work_keys_str_mv AT thomasjunier mlgscamaximumlikelihoodgeneralsequenceclassifier
AT vincentherve mlgscamaximumlikelihoodgeneralsequenceclassifier
AT tinawunderlin mlgscamaximumlikelihoodgeneralsequenceclassifier
AT pilarjunier mlgscamaximumlikelihoodgeneralsequenceclassifier