Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog

Abstract Background In eukaryote transcriptomes, a significant amount of transcript diversity comes from genes’ capacity to generate different transcripts through alternative splicing. Identifying orthologous alternative transcripts across multiple species is of particular interest for genome annota...

Full description

Bibliographic Details
Main Authors: Nicolas Guillaudeux, Catherine Belleannée, Samuel Blanquart
Format: Article
Language:English
Published: BMC 2022-03-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-022-08429-4
_version_ 1818774542015791104
author Nicolas Guillaudeux
Catherine Belleannée
Samuel Blanquart
author_facet Nicolas Guillaudeux
Catherine Belleannée
Samuel Blanquart
author_sort Nicolas Guillaudeux
collection DOAJ
description Abstract Background In eukaryote transcriptomes, a significant amount of transcript diversity comes from genes’ capacity to generate different transcripts through alternative splicing. Identifying orthologous alternative transcripts across multiple species is of particular interest for genome annotators. However, there is no formal definition of transcript orthology based on the splicing structure conservation. Likewise there is no public dataset benchmark providing groups of orthologous transcripts sharing a conserved splicing structure. Results We introduced a formal definition of splicing structure orthology and we predicted transcript orthologs in human, mouse and dog. Applying a selective strategy, we analyzed 2,167 genes and their 18,109 known transcripts and identified a set of 253 gene orthologs that shared a conserved splicing structure in all three species. We predicted 6,861 transcript CDSs (coding sequence), mainly for dog, an emergent model species. Each predicted transcript was an ortholog of a known transcript: both share the same CDS splicing structure. Evidence for the existence of the predicted CDSs was found in external data. Conclusions We generated a dataset of 253 gene triplets, structurally conserved and sharing all their CDSs in human, mouse and dog, which correspond to 879 triplets of spliced CDS orthologs. We have released the dataset both as an SQL database and as tabulated files. The data consists of the 879 CDS orthology groups with their detailed splicing structures, and the predicted CDSs, associated with their experimental evidence. The 6,861 predicted CDSs are provided in GTF files. Our data may contribute to compare highly conserved genes across three species, for comparative transcriptomics at the isoform level, or for benchmarking splice aligners and methods focusing on the identification of splicing orthologs. The data is available at https://data-access.cesgo.org/index.php/s/V97GXxOS66NqTkZ .
first_indexed 2024-12-18T10:42:48Z
format Article
id doaj.art-959ba97ff767477c9ddc09157756ff43
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-18T10:42:48Z
publishDate 2022-03-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-959ba97ff767477c9ddc09157756ff432022-12-21T21:10:36ZengBMCBMC Genomics1471-21642022-03-0123111410.1186/s12864-022-08429-4Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dogNicolas Guillaudeux0Catherine Belleannée1Samuel Blanquart2Univ Rennes, Inria, CNRS, IRISAUniv Rennes, Inria, CNRS, IRISAUniv Rennes, Inria, CNRS, IRISAAbstract Background In eukaryote transcriptomes, a significant amount of transcript diversity comes from genes’ capacity to generate different transcripts through alternative splicing. Identifying orthologous alternative transcripts across multiple species is of particular interest for genome annotators. However, there is no formal definition of transcript orthology based on the splicing structure conservation. Likewise there is no public dataset benchmark providing groups of orthologous transcripts sharing a conserved splicing structure. Results We introduced a formal definition of splicing structure orthology and we predicted transcript orthologs in human, mouse and dog. Applying a selective strategy, we analyzed 2,167 genes and their 18,109 known transcripts and identified a set of 253 gene orthologs that shared a conserved splicing structure in all three species. We predicted 6,861 transcript CDSs (coding sequence), mainly for dog, an emergent model species. Each predicted transcript was an ortholog of a known transcript: both share the same CDS splicing structure. Evidence for the existence of the predicted CDSs was found in external data. Conclusions We generated a dataset of 253 gene triplets, structurally conserved and sharing all their CDSs in human, mouse and dog, which correspond to 879 triplets of spliced CDS orthologs. We have released the dataset both as an SQL database and as tabulated files. The data consists of the 879 CDS orthology groups with their detailed splicing structures, and the predicted CDSs, associated with their experimental evidence. The 6,861 predicted CDSs are provided in GTF files. Our data may contribute to compare highly conserved genes across three species, for comparative transcriptomics at the isoform level, or for benchmarking splice aligners and methods focusing on the identification of splicing orthologs. The data is available at https://data-access.cesgo.org/index.php/s/V97GXxOS66NqTkZ .https://doi.org/10.1186/s12864-022-08429-4OrthologyTranscript orthologyTranscriptome predictionAlternative splicingAlternative transcriptionComparative genomics
spellingShingle Nicolas Guillaudeux
Catherine Belleannée
Samuel Blanquart
Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog
BMC Genomics
Orthology
Transcript orthology
Transcriptome prediction
Alternative splicing
Alternative transcription
Comparative genomics
title Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog
title_full Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog
title_fullStr Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog
title_full_unstemmed Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog
title_short Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog
title_sort identifying genes with conserved splicing structure and orthologous isoforms in human mouse and dog
topic Orthology
Transcript orthology
Transcriptome prediction
Alternative splicing
Alternative transcription
Comparative genomics
url https://doi.org/10.1186/s12864-022-08429-4
work_keys_str_mv AT nicolasguillaudeux identifyinggeneswithconservedsplicingstructureandorthologousisoformsinhumanmouseanddog
AT catherinebelleannee identifyinggeneswithconservedsplicingstructureandorthologousisoformsinhumanmouseanddog
AT samuelblanquart identifyinggeneswithconservedsplicingstructureandorthologousisoformsinhumanmouseanddog