Enriched atlas of lncRNA and protein-coding genes for the GRCg7b chicken assembly and its functional annotation across 47 tissues

Abstract Gene atlases for livestock are steadily improving thanks to new genome assemblies and new expression data improving the gene annotation. However, gene content varies across databases due to differences in RNA sequencing data and bioinformatics pipelines, especially for long non-coding RNAs...

Full description

Bibliographic Details
Main Authors: Fabien Degalez, Mathieu Charles, Sylvain Foissac, Haijuan Zhou, Dailu Guan, Lingzhao Fang, Christophe Klopp, Coralie Allain, Laetitia Lagoutte, Frédéric Lecerf, Hervé Acloque, Elisabetta Giuffra, Frédérique Pitel, Sandrine Lagarrigue
Format: Article
Language:English
Published: Nature Portfolio 2024-03-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-56705-y
_version_ 1797247252594425856
author Fabien Degalez
Mathieu Charles
Sylvain Foissac
Haijuan Zhou
Dailu Guan
Lingzhao Fang
Christophe Klopp
Coralie Allain
Laetitia Lagoutte
Frédéric Lecerf
Hervé Acloque
Elisabetta Giuffra
Frédérique Pitel
Sandrine Lagarrigue
author_facet Fabien Degalez
Mathieu Charles
Sylvain Foissac
Haijuan Zhou
Dailu Guan
Lingzhao Fang
Christophe Klopp
Coralie Allain
Laetitia Lagoutte
Frédéric Lecerf
Hervé Acloque
Elisabetta Giuffra
Frédérique Pitel
Sandrine Lagarrigue
author_sort Fabien Degalez
collection DOAJ
description Abstract Gene atlases for livestock are steadily improving thanks to new genome assemblies and new expression data improving the gene annotation. However, gene content varies across databases due to differences in RNA sequencing data and bioinformatics pipelines, especially for long non-coding RNAs (lncRNAs) which have higher tissue and developmental specificity and are harder to consistently identify compared to protein coding genes (PCGs). As done previously in 2020 for chicken assemblies galgal5 and GRCg6a, we provide a new gene atlas, lncRNA-enriched, for the latest GRCg7b chicken assembly, integrating "NCBI RefSeq", "EMBL-EBI Ensembl/GENCODE" reference annotations and other resources such as FAANG and NONCODE. As a result, the number of PCGs increases from 18,022 (RefSeq) and 17,007 (Ensembl) to 24,102, and that of lncRNAs from 5789 (RefSeq) and 11,944 (Ensembl) to 44,428. Using 1400 public RNA-seq transcriptome representing 47 tissues, we provided expression evidence for 35,257 (79%) lncRNAs and 22,468 (93%) PCGs, supporting the relevance of this atlas. Further characterization including tissue-specificity, sex-differential expression and gene configurations are provided. We also identified conserved miRNA-hosting genes with human counterparts, suggesting common function. The annotated atlas is available at gega.sigenae.org
first_indexed 2024-04-24T19:55:45Z
format Article
id doaj.art-289590f9430a4752b5aaa9be20f46cfd
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-24T19:55:45Z
publishDate 2024-03-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-289590f9430a4752b5aaa9be20f46cfd2024-03-24T12:20:49ZengNature PortfolioScientific Reports2045-23222024-03-0114111810.1038/s41598-024-56705-yEnriched atlas of lncRNA and protein-coding genes for the GRCg7b chicken assembly and its functional annotation across 47 tissuesFabien Degalez0Mathieu Charles1Sylvain Foissac2Haijuan Zhou3Dailu Guan4Lingzhao Fang5Christophe Klopp6Coralie Allain7Laetitia Lagoutte8Frédéric Lecerf9Hervé Acloque10Elisabetta Giuffra11Frédérique Pitel12Sandrine Lagarrigue13PEGASE, INRAE, Institut AgroINRAE, BioinfOmics, GenoToul Bioinformatics facility, Sigenae, Université Fédérale de ToulouseGenPhySE, Université de Toulouse, INRAE, ENVTUniversity of California DavisUniversity of California DavisAarhus UniversityINRAE, BioinfOmics, GenoToul Bioinformatics facility, Sigenae, Université Fédérale de ToulousePEGASE, INRAE, Institut AgroPEGASE, INRAE, Institut AgroPEGASE, INRAE, Institut AgroINRAE, AgroParisTech, GABI, Paris-Saclay UniversityINRAE, AgroParisTech, GABI, Paris-Saclay UniversityGenPhySE, Université de Toulouse, INRAE, ENVTPEGASE, INRAE, Institut AgroAbstract Gene atlases for livestock are steadily improving thanks to new genome assemblies and new expression data improving the gene annotation. However, gene content varies across databases due to differences in RNA sequencing data and bioinformatics pipelines, especially for long non-coding RNAs (lncRNAs) which have higher tissue and developmental specificity and are harder to consistently identify compared to protein coding genes (PCGs). As done previously in 2020 for chicken assemblies galgal5 and GRCg6a, we provide a new gene atlas, lncRNA-enriched, for the latest GRCg7b chicken assembly, integrating "NCBI RefSeq", "EMBL-EBI Ensembl/GENCODE" reference annotations and other resources such as FAANG and NONCODE. As a result, the number of PCGs increases from 18,022 (RefSeq) and 17,007 (Ensembl) to 24,102, and that of lncRNAs from 5789 (RefSeq) and 11,944 (Ensembl) to 44,428. Using 1400 public RNA-seq transcriptome representing 47 tissues, we provided expression evidence for 35,257 (79%) lncRNAs and 22,468 (93%) PCGs, supporting the relevance of this atlas. Further characterization including tissue-specificity, sex-differential expression and gene configurations are provided. We also identified conserved miRNA-hosting genes with human counterparts, suggesting common function. The annotated atlas is available at gega.sigenae.orghttps://doi.org/10.1038/s41598-024-56705-yGene atlasLong non coding RNAsChickenGenome annotationTissue specificityCo-expression
spellingShingle Fabien Degalez
Mathieu Charles
Sylvain Foissac
Haijuan Zhou
Dailu Guan
Lingzhao Fang
Christophe Klopp
Coralie Allain
Laetitia Lagoutte
Frédéric Lecerf
Hervé Acloque
Elisabetta Giuffra
Frédérique Pitel
Sandrine Lagarrigue
Enriched atlas of lncRNA and protein-coding genes for the GRCg7b chicken assembly and its functional annotation across 47 tissues
Scientific Reports
Gene atlas
Long non coding RNAs
Chicken
Genome annotation
Tissue specificity
Co-expression
title Enriched atlas of lncRNA and protein-coding genes for the GRCg7b chicken assembly and its functional annotation across 47 tissues
title_full Enriched atlas of lncRNA and protein-coding genes for the GRCg7b chicken assembly and its functional annotation across 47 tissues
title_fullStr Enriched atlas of lncRNA and protein-coding genes for the GRCg7b chicken assembly and its functional annotation across 47 tissues
title_full_unstemmed Enriched atlas of lncRNA and protein-coding genes for the GRCg7b chicken assembly and its functional annotation across 47 tissues
title_short Enriched atlas of lncRNA and protein-coding genes for the GRCg7b chicken assembly and its functional annotation across 47 tissues
title_sort enriched atlas of lncrna and protein coding genes for the grcg7b chicken assembly and its functional annotation across 47 tissues
topic Gene atlas
Long non coding RNAs
Chicken
Genome annotation
Tissue specificity
Co-expression
url https://doi.org/10.1038/s41598-024-56705-y
work_keys_str_mv AT fabiendegalez enrichedatlasoflncrnaandproteincodinggenesforthegrcg7bchickenassemblyanditsfunctionalannotationacross47tissues
AT mathieucharles enrichedatlasoflncrnaandproteincodinggenesforthegrcg7bchickenassemblyanditsfunctionalannotationacross47tissues
AT sylvainfoissac enrichedatlasoflncrnaandproteincodinggenesforthegrcg7bchickenassemblyanditsfunctionalannotationacross47tissues
AT haijuanzhou enrichedatlasoflncrnaandproteincodinggenesforthegrcg7bchickenassemblyanditsfunctionalannotationacross47tissues
AT dailuguan enrichedatlasoflncrnaandproteincodinggenesforthegrcg7bchickenassemblyanditsfunctionalannotationacross47tissues
AT lingzhaofang enrichedatlasoflncrnaandproteincodinggenesforthegrcg7bchickenassemblyanditsfunctionalannotationacross47tissues
AT christopheklopp enrichedatlasoflncrnaandproteincodinggenesforthegrcg7bchickenassemblyanditsfunctionalannotationacross47tissues
AT coralieallain enrichedatlasoflncrnaandproteincodinggenesforthegrcg7bchickenassemblyanditsfunctionalannotationacross47tissues
AT laetitialagoutte enrichedatlasoflncrnaandproteincodinggenesforthegrcg7bchickenassemblyanditsfunctionalannotationacross47tissues
AT fredericlecerf enrichedatlasoflncrnaandproteincodinggenesforthegrcg7bchickenassemblyanditsfunctionalannotationacross47tissues
AT herveacloque enrichedatlasoflncrnaandproteincodinggenesforthegrcg7bchickenassemblyanditsfunctionalannotationacross47tissues
AT elisabettagiuffra enrichedatlasoflncrnaandproteincodinggenesforthegrcg7bchickenassemblyanditsfunctionalannotationacross47tissues
AT frederiquepitel enrichedatlasoflncrnaandproteincodinggenesforthegrcg7bchickenassemblyanditsfunctionalannotationacross47tissues
AT sandrinelagarrigue enrichedatlasoflncrnaandproteincodinggenesforthegrcg7bchickenassemblyanditsfunctionalannotationacross47tissues