Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 Genome
Sugarcane is a species of tall grass with high biomass and sucrose production, and the world’s largest crop by production quantity. Its evolutionary environment adaptation and anthropogenic breeding response have resulted in a complex autopolyploid genome. Few efforts have been reported in the liter...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-12-01
|
Series: | Data |
Subjects: | |
Online Access: | https://www.mdpi.com/2306-5729/8/1/1 |
_version_ | 1797946591877791744 |
---|---|
author | Nicolás López-Rozo Mauricio Ramirez-Castrillon Miguel Romero Jorge Finke Camilo Rocha |
author_facet | Nicolás López-Rozo Mauricio Ramirez-Castrillon Miguel Romero Jorge Finke Camilo Rocha |
author_sort | Nicolás López-Rozo |
collection | DOAJ |
description | Sugarcane is a species of tall grass with high biomass and sucrose production, and the world’s largest crop by production quantity. Its evolutionary environment adaptation and anthropogenic breeding response have resulted in a complex autopolyploid genome. Few efforts have been reported in the literature to document this organism’s gene co-expression and annotation, and, when available, use different gene identifiers that cannot be easily associated across studies. This data descriptor paper presents a dataset that consolidates expression matrices of two <i>Saccharum spontaneum</i> AP85-441 genome versions and an algorithm implemented in Python to mechanically obtain this dataset. The data are processed from the allele-level information of the two sources, with BLASTn used bidirectionally to suggest feasible mappings between the two sets of alleles, and a graph-matching optimization algorithm to maximize global identity and uniqueness of genes. Association tables are used to consolidate the expression values from alleles to genes. The contributed expression matrices comprise 96 experiments and 109,050 and 35,516 from the two genome versions. They can represent significant computational cost reduction for further research on, e.g., sugarcane co-expression network generation, functional annotation prediction, and stress-specific gene identification. |
first_indexed | 2024-04-10T21:13:28Z |
format | Article |
id | doaj.art-b6a5d88b09cf4e7d9a88027fc496d506 |
institution | Directory Open Access Journal |
issn | 2306-5729 |
language | English |
last_indexed | 2024-04-10T21:13:28Z |
publishDate | 2022-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Data |
spelling | doaj.art-b6a5d88b09cf4e7d9a88027fc496d5062023-01-20T14:43:57ZengMDPI AGData2306-57292022-12-0181110.3390/data8010001Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 GenomeNicolás López-Rozo0Mauricio Ramirez-Castrillon1Miguel Romero2Jorge Finke3Camilo Rocha4Department of Electronics and Computer Science, Pontificia Universidad Javeriana, Cali 760031, ColombiaOMICAS Program, Pontificia Universidad Javeriana, Cali 760031, ColombiaDepartment of Electronics and Computer Science, Pontificia Universidad Javeriana, Cali 760031, ColombiaDepartment of Electronics and Computer Science, Pontificia Universidad Javeriana, Cali 760031, ColombiaDepartment of Electronics and Computer Science, Pontificia Universidad Javeriana, Cali 760031, ColombiaSugarcane is a species of tall grass with high biomass and sucrose production, and the world’s largest crop by production quantity. Its evolutionary environment adaptation and anthropogenic breeding response have resulted in a complex autopolyploid genome. Few efforts have been reported in the literature to document this organism’s gene co-expression and annotation, and, when available, use different gene identifiers that cannot be easily associated across studies. This data descriptor paper presents a dataset that consolidates expression matrices of two <i>Saccharum spontaneum</i> AP85-441 genome versions and an algorithm implemented in Python to mechanically obtain this dataset. The data are processed from the allele-level information of the two sources, with BLASTn used bidirectionally to suggest feasible mappings between the two sets of alleles, and a graph-matching optimization algorithm to maximize global identity and uniqueness of genes. Association tables are used to consolidate the expression values from alleles to genes. The contributed expression matrices comprise 96 experiments and 109,050 and 35,516 from the two genome versions. They can represent significant computational cost reduction for further research on, e.g., sugarcane co-expression network generation, functional annotation prediction, and stress-specific gene identification.https://www.mdpi.com/2306-5729/8/1/1sugarcaneexpression matrixallele expressiongraph flow |
spellingShingle | Nicolás López-Rozo Mauricio Ramirez-Castrillon Miguel Romero Jorge Finke Camilo Rocha Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 Genome Data sugarcane expression matrix allele expression graph flow |
title | Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 Genome |
title_full | Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 Genome |
title_fullStr | Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 Genome |
title_full_unstemmed | Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 Genome |
title_short | Gene Expression Datasets for Two Versions of the <i>Saccharum spontaneum</i> AP85-441 Genome |
title_sort | gene expression datasets for two versions of the i saccharum spontaneum i ap85 441 genome |
topic | sugarcane expression matrix allele expression graph flow |
url | https://www.mdpi.com/2306-5729/8/1/1 |
work_keys_str_mv | AT nicolaslopezrozo geneexpressiondatasetsfortwoversionsoftheisaccharumspontaneumiap85441genome AT mauricioramirezcastrillon geneexpressiondatasetsfortwoversionsoftheisaccharumspontaneumiap85441genome AT miguelromero geneexpressiondatasetsfortwoversionsoftheisaccharumspontaneumiap85441genome AT jorgefinke geneexpressiondatasetsfortwoversionsoftheisaccharumspontaneumiap85441genome AT camilorocha geneexpressiondatasetsfortwoversionsoftheisaccharumspontaneumiap85441genome |