A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon

Atlantic salmon (Salmo salar) is a major species produced in world aquaculture and an important vertebrate model organism for studying the process of rediploidization following whole genome duplication events (Ss4R, 80 mya). The current Salmo salar transcriptome is largely generated from genome sequ...

Full description

Bibliographic Details
Main Authors: Sigmund Ramberg, Bjørn Høyheim, Tone-Kari Knutsdatter Østbye, Rune Andreassen
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-04-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2021.656334/full
_version_ 1818671413001715712
author Sigmund Ramberg
Bjørn Høyheim
Tone-Kari Knutsdatter Østbye
Rune Andreassen
author_facet Sigmund Ramberg
Bjørn Høyheim
Tone-Kari Knutsdatter Østbye
Rune Andreassen
author_sort Sigmund Ramberg
collection DOAJ
description Atlantic salmon (Salmo salar) is a major species produced in world aquaculture and an important vertebrate model organism for studying the process of rediploidization following whole genome duplication events (Ss4R, 80 mya). The current Salmo salar transcriptome is largely generated from genome sequence based in silico predictions supported by ESTs and short-read sequencing data. However, recent progress in long-read sequencing technologies now allows for full-length transcript sequencing from single RNA-molecules. This study provides a de novo full-length mRNA transcriptome from liver, head-kidney and gill materials. A pipeline was developed based on Iso-seq sequencing of long-reads on the PacBio platform (HQ reads) followed by error-correction of the HQ reads by short-reads from the Illumina platform. The pipeline successfully processed more than 1.5 million long-reads and more than 900 million short-reads into error-corrected HQ reads. A surprisingly high percentage (32%) represented expressed interspersed repeats, while the remaining were processed into 71 461 full-length mRNAs from 23 071 loci. Each transcript was supported by several single-molecule long-read sequences and at least three short-reads, assuring a high sequence accuracy. On average, each gene was represented by three isoforms. Comparisons to the current Atlantic salmon transcripts in the RefSeq database showed that the long-read transcriptome validated 25% of all known transcripts, while the remaining full-length transcripts were novel isoforms, but few were transcripts from novel genes. A comparison to the current genome assembly indicates that the long-read transcriptome may aid in improving transcript annotation as well as provide long-read linkage information useful for improving the genome assembly. More than 80% of transcripts were assigned GO terms and thousands of transcripts were from genes or splice-variants expressed in an organ-specific manner demonstrating that hybrid error-corrected long-read transcriptomes may be applied to study genes and splice-variants expressed in certain organs or conditions (e.g., challenge materials). In conclusion, this is the single largest contribution of full-length mRNAs in Atlantic salmon. The results will be of great value to salmon genomics research, and the pipeline outlined may be applied to generate additional de novo transcriptomes in Atlantic Salmon or applied for similar projects in other species.
first_indexed 2024-12-17T07:23:36Z
format Article
id doaj.art-67ccaee93eb147ae94158cfd76f2efdf
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-12-17T07:23:36Z
publishDate 2021-04-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-67ccaee93eb147ae94158cfd76f2efdf2022-12-21T21:58:41ZengFrontiers Media S.A.Frontiers in Genetics1664-80212021-04-011210.3389/fgene.2021.656334656334A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic SalmonSigmund Ramberg0Bjørn Høyheim1Tone-Kari Knutsdatter Østbye2Rune Andreassen3Department of Life Sciences and Health, Faculty of Health Sciences, OsloMet – Oslo Metropolitan University, Oslo, NorwayDepartment of Preclinical Sciences and Pathology, Faculty of Veterinary Medicine, Norwegian University of Life Sciences, Ås, NorwayNofima (Norwegian Institute of Food, Fisheries and Aquaculture Research), Ås, NorwayDepartment of Life Sciences and Health, Faculty of Health Sciences, OsloMet – Oslo Metropolitan University, Oslo, NorwayAtlantic salmon (Salmo salar) is a major species produced in world aquaculture and an important vertebrate model organism for studying the process of rediploidization following whole genome duplication events (Ss4R, 80 mya). The current Salmo salar transcriptome is largely generated from genome sequence based in silico predictions supported by ESTs and short-read sequencing data. However, recent progress in long-read sequencing technologies now allows for full-length transcript sequencing from single RNA-molecules. This study provides a de novo full-length mRNA transcriptome from liver, head-kidney and gill materials. A pipeline was developed based on Iso-seq sequencing of long-reads on the PacBio platform (HQ reads) followed by error-correction of the HQ reads by short-reads from the Illumina platform. The pipeline successfully processed more than 1.5 million long-reads and more than 900 million short-reads into error-corrected HQ reads. A surprisingly high percentage (32%) represented expressed interspersed repeats, while the remaining were processed into 71 461 full-length mRNAs from 23 071 loci. Each transcript was supported by several single-molecule long-read sequences and at least three short-reads, assuring a high sequence accuracy. On average, each gene was represented by three isoforms. Comparisons to the current Atlantic salmon transcripts in the RefSeq database showed that the long-read transcriptome validated 25% of all known transcripts, while the remaining full-length transcripts were novel isoforms, but few were transcripts from novel genes. A comparison to the current genome assembly indicates that the long-read transcriptome may aid in improving transcript annotation as well as provide long-read linkage information useful for improving the genome assembly. More than 80% of transcripts were assigned GO terms and thousands of transcripts were from genes or splice-variants expressed in an organ-specific manner demonstrating that hybrid error-corrected long-read transcriptomes may be applied to study genes and splice-variants expressed in certain organs or conditions (e.g., challenge materials). In conclusion, this is the single largest contribution of full-length mRNAs in Atlantic salmon. The results will be of great value to salmon genomics research, and the pipeline outlined may be applied to generate additional de novo transcriptomes in Atlantic Salmon or applied for similar projects in other species.https://www.frontiersin.org/articles/10.3389/fgene.2021.656334/fullAtlantic salmontranscriptomefull-length mRNAhybrid error correctionPacBio Iso-seqIllumina sequencing
spellingShingle Sigmund Ramberg
Bjørn Høyheim
Tone-Kari Knutsdatter Østbye
Rune Andreassen
A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon
Frontiers in Genetics
Atlantic salmon
transcriptome
full-length mRNA
hybrid error correction
PacBio Iso-seq
Illumina sequencing
title A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon
title_full A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon
title_fullStr A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon
title_full_unstemmed A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon
title_short A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon
title_sort de novo full length mrna transcriptome generated from hybrid corrected pacbio long reads improves the transcript annotation and identifies thousands of novel splice variants in atlantic salmon
topic Atlantic salmon
transcriptome
full-length mRNA
hybrid error correction
PacBio Iso-seq
Illumina sequencing
url https://www.frontiersin.org/articles/10.3389/fgene.2021.656334/full
work_keys_str_mv AT sigmundramberg adenovofulllengthmrnatranscriptomegeneratedfromhybridcorrectedpacbiolongreadsimprovesthetranscriptannotationandidentifiesthousandsofnovelsplicevariantsinatlanticsalmon
AT bjørnhøyheim adenovofulllengthmrnatranscriptomegeneratedfromhybridcorrectedpacbiolongreadsimprovesthetranscriptannotationandidentifiesthousandsofnovelsplicevariantsinatlanticsalmon
AT tonekariknutsdatterøstbye adenovofulllengthmrnatranscriptomegeneratedfromhybridcorrectedpacbiolongreadsimprovesthetranscriptannotationandidentifiesthousandsofnovelsplicevariantsinatlanticsalmon
AT runeandreassen adenovofulllengthmrnatranscriptomegeneratedfromhybridcorrectedpacbiolongreadsimprovesthetranscriptannotationandidentifiesthousandsofnovelsplicevariantsinatlanticsalmon
AT sigmundramberg denovofulllengthmrnatranscriptomegeneratedfromhybridcorrectedpacbiolongreadsimprovesthetranscriptannotationandidentifiesthousandsofnovelsplicevariantsinatlanticsalmon
AT bjørnhøyheim denovofulllengthmrnatranscriptomegeneratedfromhybridcorrectedpacbiolongreadsimprovesthetranscriptannotationandidentifiesthousandsofnovelsplicevariantsinatlanticsalmon
AT tonekariknutsdatterøstbye denovofulllengthmrnatranscriptomegeneratedfromhybridcorrectedpacbiolongreadsimprovesthetranscriptannotationandidentifiesthousandsofnovelsplicevariantsinatlanticsalmon
AT runeandreassen denovofulllengthmrnatranscriptomegeneratedfromhybridcorrectedpacbiolongreadsimprovesthetranscriptannotationandidentifiesthousandsofnovelsplicevariantsinatlanticsalmon