MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes
Abstract Metatranscriptomics generates large volumes of sequence data about transcribed genes in natural environments. Taxonomic annotation of these datasets depends on availability of curated reference sequences. For marine microbial eukaryotes, current reference libraries are limited by gaps in se...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-12-01
|
Series: | Scientific Data |
Online Access: | https://doi.org/10.1038/s41597-023-02842-4 |
_version_ | 1797377209145491456 |
---|---|
author | R. D. Groussman S. Blaskowski S. N. Coesel E. V. Armbrust |
author_facet | R. D. Groussman S. Blaskowski S. N. Coesel E. V. Armbrust |
author_sort | R. D. Groussman |
collection | DOAJ |
description | Abstract Metatranscriptomics generates large volumes of sequence data about transcribed genes in natural environments. Taxonomic annotation of these datasets depends on availability of curated reference sequences. For marine microbial eukaryotes, current reference libraries are limited by gaps in sequenced organism diversity and barriers to updating libraries with new sequence data, resulting in taxonomic annotation of about half of eukaryotic environmental transcripts. Here, we introduce Marine Functional EukaRyotic Reference Taxa (MarFERReT), a marine microbial eukaryotic sequence library designed for use with taxonomic annotation of eukaryotic metatranscriptomes. We gathered 902 publicly accessible marine eukaryote genomes and transcriptomes and assessed their sequence quality and cross-contamination issues, selecting 800 validated entries for inclusion in MarFERReT. Version 1.1 of MarFERReT contains reference sequences from 800 marine eukaryotic genomes and transcriptomes, covering 453 species- and strain-level taxa, totaling nearly 28 million protein sequences with associated NCBI and PR2 Taxonomy identifiers and Pfam functional annotations. The MarFERReT project repository hosts containerized build scripts, documentation on installation and use case examples, and information on new versions of MarFERReT. |
first_indexed | 2024-03-08T19:49:29Z |
format | Article |
id | doaj.art-9078559df11e45fe821783b28484cae7 |
institution | Directory Open Access Journal |
issn | 2052-4463 |
language | English |
last_indexed | 2024-03-08T19:49:29Z |
publishDate | 2023-12-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Data |
spelling | doaj.art-9078559df11e45fe821783b28484cae72023-12-24T12:09:31ZengNature PortfolioScientific Data2052-44632023-12-0110111910.1038/s41597-023-02842-4MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genesR. D. Groussman0S. Blaskowski1S. N. Coesel2E. V. Armbrust3School of Oceanography, University of WashingtonSchool of Oceanography, University of WashingtonSchool of Oceanography, University of WashingtonSchool of Oceanography, University of WashingtonAbstract Metatranscriptomics generates large volumes of sequence data about transcribed genes in natural environments. Taxonomic annotation of these datasets depends on availability of curated reference sequences. For marine microbial eukaryotes, current reference libraries are limited by gaps in sequenced organism diversity and barriers to updating libraries with new sequence data, resulting in taxonomic annotation of about half of eukaryotic environmental transcripts. Here, we introduce Marine Functional EukaRyotic Reference Taxa (MarFERReT), a marine microbial eukaryotic sequence library designed for use with taxonomic annotation of eukaryotic metatranscriptomes. We gathered 902 publicly accessible marine eukaryote genomes and transcriptomes and assessed their sequence quality and cross-contamination issues, selecting 800 validated entries for inclusion in MarFERReT. Version 1.1 of MarFERReT contains reference sequences from 800 marine eukaryotic genomes and transcriptomes, covering 453 species- and strain-level taxa, totaling nearly 28 million protein sequences with associated NCBI and PR2 Taxonomy identifiers and Pfam functional annotations. The MarFERReT project repository hosts containerized build scripts, documentation on installation and use case examples, and information on new versions of MarFERReT.https://doi.org/10.1038/s41597-023-02842-4 |
spellingShingle | R. D. Groussman S. Blaskowski S. N. Coesel E. V. Armbrust MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes Scientific Data |
title | MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes |
title_full | MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes |
title_fullStr | MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes |
title_full_unstemmed | MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes |
title_short | MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes |
title_sort | marferret an open source version controlled reference library of marine microbial eukaryote functional genes |
url | https://doi.org/10.1038/s41597-023-02842-4 |
work_keys_str_mv | AT rdgroussman marferretanopensourceversioncontrolledreferencelibraryofmarinemicrobialeukaryotefunctionalgenes AT sblaskowski marferretanopensourceversioncontrolledreferencelibraryofmarinemicrobialeukaryotefunctionalgenes AT sncoesel marferretanopensourceversioncontrolledreferencelibraryofmarinemicrobialeukaryotefunctionalgenes AT evarmbrust marferretanopensourceversioncontrolledreferencelibraryofmarinemicrobialeukaryotefunctionalgenes |