MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes

Abstract Metatranscriptomics generates large volumes of sequence data about transcribed genes in natural environments. Taxonomic annotation of these datasets depends on availability of curated reference sequences. For marine microbial eukaryotes, current reference libraries are limited by gaps in se...

Full description

Bibliographic Details
Main Authors: R. D. Groussman, S. Blaskowski, S. N. Coesel, E. V. Armbrust
Format: Article
Language:English
Published: Nature Portfolio 2023-12-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-023-02842-4
_version_ 1797377209145491456
author R. D. Groussman
S. Blaskowski
S. N. Coesel
E. V. Armbrust
author_facet R. D. Groussman
S. Blaskowski
S. N. Coesel
E. V. Armbrust
author_sort R. D. Groussman
collection DOAJ
description Abstract Metatranscriptomics generates large volumes of sequence data about transcribed genes in natural environments. Taxonomic annotation of these datasets depends on availability of curated reference sequences. For marine microbial eukaryotes, current reference libraries are limited by gaps in sequenced organism diversity and barriers to updating libraries with new sequence data, resulting in taxonomic annotation of about half of eukaryotic environmental transcripts. Here, we introduce Marine Functional EukaRyotic Reference Taxa (MarFERReT), a marine microbial eukaryotic sequence library designed for use with taxonomic annotation of eukaryotic metatranscriptomes. We gathered 902 publicly accessible marine eukaryote genomes and transcriptomes and assessed their sequence quality and cross-contamination issues, selecting 800 validated entries for inclusion in MarFERReT. Version 1.1 of MarFERReT contains reference sequences from 800 marine eukaryotic genomes and transcriptomes, covering 453 species- and strain-level taxa, totaling nearly 28 million protein sequences with associated NCBI and PR2 Taxonomy identifiers and Pfam functional annotations. The MarFERReT project repository hosts containerized build scripts, documentation on installation and use case examples, and information on new versions of MarFERReT.
first_indexed 2024-03-08T19:49:29Z
format Article
id doaj.art-9078559df11e45fe821783b28484cae7
institution Directory Open Access Journal
issn 2052-4463
language English
last_indexed 2024-03-08T19:49:29Z
publishDate 2023-12-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj.art-9078559df11e45fe821783b28484cae72023-12-24T12:09:31ZengNature PortfolioScientific Data2052-44632023-12-0110111910.1038/s41597-023-02842-4MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genesR. D. Groussman0S. Blaskowski1S. N. Coesel2E. V. Armbrust3School of Oceanography, University of WashingtonSchool of Oceanography, University of WashingtonSchool of Oceanography, University of WashingtonSchool of Oceanography, University of WashingtonAbstract Metatranscriptomics generates large volumes of sequence data about transcribed genes in natural environments. Taxonomic annotation of these datasets depends on availability of curated reference sequences. For marine microbial eukaryotes, current reference libraries are limited by gaps in sequenced organism diversity and barriers to updating libraries with new sequence data, resulting in taxonomic annotation of about half of eukaryotic environmental transcripts. Here, we introduce Marine Functional EukaRyotic Reference Taxa (MarFERReT), a marine microbial eukaryotic sequence library designed for use with taxonomic annotation of eukaryotic metatranscriptomes. We gathered 902 publicly accessible marine eukaryote genomes and transcriptomes and assessed their sequence quality and cross-contamination issues, selecting 800 validated entries for inclusion in MarFERReT. Version 1.1 of MarFERReT contains reference sequences from 800 marine eukaryotic genomes and transcriptomes, covering 453 species- and strain-level taxa, totaling nearly 28 million protein sequences with associated NCBI and PR2 Taxonomy identifiers and Pfam functional annotations. The MarFERReT project repository hosts containerized build scripts, documentation on installation and use case examples, and information on new versions of MarFERReT.https://doi.org/10.1038/s41597-023-02842-4
spellingShingle R. D. Groussman
S. Blaskowski
S. N. Coesel
E. V. Armbrust
MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes
Scientific Data
title MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes
title_full MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes
title_fullStr MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes
title_full_unstemmed MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes
title_short MarFERReT, an open-source, version-controlled reference library of marine microbial eukaryote functional genes
title_sort marferret an open source version controlled reference library of marine microbial eukaryote functional genes
url https://doi.org/10.1038/s41597-023-02842-4
work_keys_str_mv AT rdgroussman marferretanopensourceversioncontrolledreferencelibraryofmarinemicrobialeukaryotefunctionalgenes
AT sblaskowski marferretanopensourceversioncontrolledreferencelibraryofmarinemicrobialeukaryotefunctionalgenes
AT sncoesel marferretanopensourceversioncontrolledreferencelibraryofmarinemicrobialeukaryotefunctionalgenes
AT evarmbrust marferretanopensourceversioncontrolledreferencelibraryofmarinemicrobialeukaryotefunctionalgenes