Genomic data integration tutorial, a plant case study

Abstract Background The ongoing evolution of the Next Generation Sequencing (NGS) technologies has led to the production of genomic data on a massive scale. While tools for genomic data integration and analysis are becoming increasingly available, the conceptual and analytical complexities still rep...

Full description

Bibliographic Details
Main Authors: Emile Mardoc, Mamadou Dia Sow, Sébastien Déjean, Jérôme Salse
Format: Article
Language:English
Published: BMC 2024-01-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-023-09833-0
_version_ 1827377441380761600
author Emile Mardoc
Mamadou Dia Sow
Sébastien Déjean
Jérôme Salse
author_facet Emile Mardoc
Mamadou Dia Sow
Sébastien Déjean
Jérôme Salse
author_sort Emile Mardoc
collection DOAJ
description Abstract Background The ongoing evolution of the Next Generation Sequencing (NGS) technologies has led to the production of genomic data on a massive scale. While tools for genomic data integration and analysis are becoming increasingly available, the conceptual and analytical complexities still represent a great challenge in many biological contexts. Results To address this issue, we describe a six-steps tutorial for the best practices in genomic data integration, consisting of (1) designing a data matrix; (2) formulating a specific biological question toward data description, selection and prediction; (3) selecting a tool adapted to the targeted questions; (4) preprocessing of the data; (5) conducting preliminary analysis, and finally (6) executing genomic data integration. Conclusion The tutorial has been tested and demonstrated on publicly available genomic data generated from poplar (Populus L.), a woody plant model. We also developed a new graphical output for the unsupervised multi-block analysis, cimDiablo_v2, available at https://forgemia.inra.fr/umr-gdec/omics-integration-on-poplar , and allowing the selection of master drivers in genomic data variation and interplay.
first_indexed 2024-03-08T12:40:14Z
format Article
id doaj.art-4436600db7f7408e976679a3951ed4e6
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-03-08T12:40:14Z
publishDate 2024-01-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-4436600db7f7408e976679a3951ed4e62024-01-21T12:11:43ZengBMCBMC Genomics1471-21642024-01-0125111510.1186/s12864-023-09833-0Genomic data integration tutorial, a plant case studyEmile Mardoc0Mamadou Dia Sow1Sébastien Déjean2Jérôme Salse3UCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC)UCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC)Institut de Mathématiques de Toulouse, UMR 5219, Université de Toulouse, CNRS, Université Paul SabatierUCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC)Abstract Background The ongoing evolution of the Next Generation Sequencing (NGS) technologies has led to the production of genomic data on a massive scale. While tools for genomic data integration and analysis are becoming increasingly available, the conceptual and analytical complexities still represent a great challenge in many biological contexts. Results To address this issue, we describe a six-steps tutorial for the best practices in genomic data integration, consisting of (1) designing a data matrix; (2) formulating a specific biological question toward data description, selection and prediction; (3) selecting a tool adapted to the targeted questions; (4) preprocessing of the data; (5) conducting preliminary analysis, and finally (6) executing genomic data integration. Conclusion The tutorial has been tested and demonstrated on publicly available genomic data generated from poplar (Populus L.), a woody plant model. We also developed a new graphical output for the unsupervised multi-block analysis, cimDiablo_v2, available at https://forgemia.inra.fr/umr-gdec/omics-integration-on-poplar , and allowing the selection of master drivers in genomic data variation and interplay.https://doi.org/10.1186/s12864-023-09833-0OmicsIntegrationSystemBiology
spellingShingle Emile Mardoc
Mamadou Dia Sow
Sébastien Déjean
Jérôme Salse
Genomic data integration tutorial, a plant case study
BMC Genomics
Omics
Integration
System
Biology
title Genomic data integration tutorial, a plant case study
title_full Genomic data integration tutorial, a plant case study
title_fullStr Genomic data integration tutorial, a plant case study
title_full_unstemmed Genomic data integration tutorial, a plant case study
title_short Genomic data integration tutorial, a plant case study
title_sort genomic data integration tutorial a plant case study
topic Omics
Integration
System
Biology
url https://doi.org/10.1186/s12864-023-09833-0
work_keys_str_mv AT emilemardoc genomicdataintegrationtutorialaplantcasestudy
AT mamadoudiasow genomicdataintegrationtutorialaplantcasestudy
AT sebastiendejean genomicdataintegrationtutorialaplantcasestudy
AT jeromesalse genomicdataintegrationtutorialaplantcasestudy