Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 Genome

Sugarcane is a species of tall grass with high biomass and sucrose production, and the world’s largest crop by production quantity. Its evolutionary environment adaptation and anthropogenic breeding response have resulted in a complex autopolyploid genome. Few efforts have been reported in the liter...

Full description

Bibliographic Details
Main Authors: Nicolás López-Rozo, Mauricio Ramirez-Castrillon, Miguel Romero, Jorge Finke, Camilo Rocha
Format: Article
Language:English
Published: MDPI AG 2022-12-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/8/1/1
_version_ 1797946591877791744
author Nicolás López-Rozo
Mauricio Ramirez-Castrillon
Miguel Romero
Jorge Finke
Camilo Rocha
author_facet Nicolás López-Rozo
Mauricio Ramirez-Castrillon
Miguel Romero
Jorge Finke
Camilo Rocha
author_sort Nicolás López-Rozo
collection DOAJ
description Sugarcane is a species of tall grass with high biomass and sucrose production, and the world’s largest crop by production quantity. Its evolutionary environment adaptation and anthropogenic breeding response have resulted in a complex autopolyploid genome. Few efforts have been reported in the literature to document this organism’s gene co-expression and annotation, and, when available, use different gene identifiers that cannot be easily associated across studies. This data descriptor paper presents a dataset that consolidates expression matrices of two <i>Saccharum spontaneum</i> AP85-441 genome versions and an algorithm implemented in Python to mechanically obtain this dataset. The data are processed from the allele-level information of the two sources, with BLASTn used bidirectionally to suggest feasible mappings between the two sets of alleles, and a graph-matching optimization algorithm to maximize global identity and uniqueness of genes. Association tables are used to consolidate the expression values from alleles to genes. The contributed expression matrices comprise 96 experiments and 109,050 and 35,516 from the two genome versions. They can represent significant computational cost reduction for further research on, e.g., sugarcane co-expression network generation, functional annotation prediction, and stress-specific gene identification.
first_indexed 2024-04-10T21:13:28Z
format Article
id doaj.art-b6a5d88b09cf4e7d9a88027fc496d506
institution Directory Open Access Journal
issn 2306-5729
language English
last_indexed 2024-04-10T21:13:28Z
publishDate 2022-12-01
publisher MDPI AG
record_format Article
series Data
spelling doaj.art-b6a5d88b09cf4e7d9a88027fc496d5062023-01-20T14:43:57ZengMDPI AGData2306-57292022-12-0181110.3390/data8010001Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 GenomeNicolás López-Rozo0Mauricio Ramirez-Castrillon1Miguel Romero2Jorge Finke3Camilo Rocha4Department of Electronics and Computer Science, Pontificia Universidad Javeriana, Cali 760031, ColombiaOMICAS Program, Pontificia Universidad Javeriana, Cali 760031, ColombiaDepartment of Electronics and Computer Science, Pontificia Universidad Javeriana, Cali 760031, ColombiaDepartment of Electronics and Computer Science, Pontificia Universidad Javeriana, Cali 760031, ColombiaDepartment of Electronics and Computer Science, Pontificia Universidad Javeriana, Cali 760031, ColombiaSugarcane is a species of tall grass with high biomass and sucrose production, and the world’s largest crop by production quantity. Its evolutionary environment adaptation and anthropogenic breeding response have resulted in a complex autopolyploid genome. Few efforts have been reported in the literature to document this organism’s gene co-expression and annotation, and, when available, use different gene identifiers that cannot be easily associated across studies. This data descriptor paper presents a dataset that consolidates expression matrices of two <i>Saccharum spontaneum</i> AP85-441 genome versions and an algorithm implemented in Python to mechanically obtain this dataset. The data are processed from the allele-level information of the two sources, with BLASTn used bidirectionally to suggest feasible mappings between the two sets of alleles, and a graph-matching optimization algorithm to maximize global identity and uniqueness of genes. Association tables are used to consolidate the expression values from alleles to genes. The contributed expression matrices comprise 96 experiments and 109,050 and 35,516 from the two genome versions. They can represent significant computational cost reduction for further research on, e.g., sugarcane co-expression network generation, functional annotation prediction, and stress-specific gene identification.https://www.mdpi.com/2306-5729/8/1/1sugarcaneexpression matrixallele expressiongraph flow
spellingShingle Nicolás López-Rozo
Mauricio Ramirez-Castrillon
Miguel Romero
Jorge Finke
Camilo Rocha
Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 Genome
Data
sugarcane
expression matrix
allele expression
graph flow
title Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 Genome
title_full Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 Genome
title_fullStr Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 Genome
title_full_unstemmed Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 Genome
title_short Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 Genome
title_sort gene expression datasets for two versions of the i saccharum spontaneum i ap85 441 genome
topic sugarcane
expression matrix
allele expression
graph flow
url https://www.mdpi.com/2306-5729/8/1/1
work_keys_str_mv AT nicolaslopezrozo geneexpressiondatasetsfortwoversionsoftheisaccharumspontaneumiap85441genome
AT mauricioramirezcastrillon geneexpressiondatasetsfortwoversionsoftheisaccharumspontaneumiap85441genome
AT miguelromero geneexpressiondatasetsfortwoversionsoftheisaccharumspontaneumiap85441genome
AT jorgefinke geneexpressiondatasetsfortwoversionsoftheisaccharumspontaneumiap85441genome
AT camilorocha geneexpressiondatasetsfortwoversionsoftheisaccharumspontaneumiap85441genome