Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection
The widespread use of High-Throughput Sequencing (HTS) for detection of plant viruses and sequencing of plant virus genomes has led to the generation of large amounts of data and of bioinformatics challenges to process them. Many bioinformatics pipelines for virus detection are available, making the...
Main Authors: | , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Peer Community In
2021-12-01
|
Series: | Peer Community Journal |
Online Access: | https://peercommunityjournal.org/articles/10.24072/pcjournal.62/ |
_version_ | 1797651175263174656 |
---|---|
author | Tamisier, Lucie Haegeman, Annelies Foucart, Yoika Fouillien, Nicolas Al Rwahnih, Maher Buzkan, Nihal Candresse, Thierry Chiumenti, Michela De Jonghe, Kris Lefebvre, Marie Margaria, Paolo Reynard, Jean Sébastien Stevens, Kristian Kutnjak, Denis Massart, Sébastien |
author_facet | Tamisier, Lucie Haegeman, Annelies Foucart, Yoika Fouillien, Nicolas Al Rwahnih, Maher Buzkan, Nihal Candresse, Thierry Chiumenti, Michela De Jonghe, Kris Lefebvre, Marie Margaria, Paolo Reynard, Jean Sébastien Stevens, Kristian Kutnjak, Denis Massart, Sébastien |
author_sort | Tamisier, Lucie |
collection | DOAJ |
description | The widespread use of High-Throughput Sequencing (HTS) for detection of plant viruses and sequencing of plant virus genomes has led to the generation of large amounts of data and of bioinformatics challenges to process them. Many bioinformatics pipelines for virus detection are available, making the choice of a suitable one difficult. A robust benchmarking is needed for the unbiased comparison of the pipelines, but there is currently a lack of reference datasets that could be used for this purpose. We present 7 semi-artificial datasets composed of real RNA-seq datasets from virus-infected plants spiked with artificial virus reads. Each dataset addresses challenges that could prevent virus detection. We also present 3 real datasets showing a challenging virus composition as well as 8 completely artificial datasets to test haplotype reconstruction software. With these datasets that address several diagnostic challenges, we hope to encourage virologists, diagnosticians and bioinformaticians to evaluate and benchmark their pipeline(s). |
first_indexed | 2024-03-11T16:12:07Z |
format | Article |
id | doaj.art-5785239bbcdd41bfb6a61d80d9e92215 |
institution | Directory Open Access Journal |
issn | 2804-3871 |
language | English |
last_indexed | 2024-03-11T16:12:07Z |
publishDate | 2021-12-01 |
publisher | Peer Community In |
record_format | Article |
series | Peer Community Journal |
spelling | doaj.art-5785239bbcdd41bfb6a61d80d9e922152023-10-24T14:38:24ZengPeer Community InPeer Community Journal2804-38712021-12-01110.24072/pcjournal.6210.24072/pcjournal.62Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detectionTamisier, Lucie0https://orcid.org/0000-0002-9231-2997Haegeman, Annelies1https://orcid.org/0000-0002-8192-5368Foucart, Yoika2Fouillien, Nicolas3 Al Rwahnih, Maher4https://orcid.org/0000-0003-1589-9234Buzkan, Nihal5https://orcid.org/0000-0002-0428-0447Candresse, Thierry6Chiumenti, Michela7https://orcid.org/0000-0002-8412-3037De Jonghe, Kris8Lefebvre, Marie9https://orcid.org/0000-0002-3093-5873Margaria, Paolo10https://orcid.org/0000-0001-8670-1331Reynard, Jean Sébastien11https://orcid.org/0000-0002-2337-107XStevens, Kristian12Kutnjak, Denis13https://orcid.org/0000-0002-5327-0587Massart, Sébastien14https://orcid.org/0000-0002-7576-6188Université de Liège, Terra-Gembloux Agro-Bio Tech, Plant Pathology Laboratory, Passage des Déportés, 2, 5030 Gembloux, BelgiumPlant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Burg. Van Gansberghelaan 96, 9820 Merelbeke, BelgiumPlant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Burg. Van Gansberghelaan 96, 9820 Merelbeke, BelgiumUniversité de Liège, Terra-Gembloux Agro-Bio Tech, Plant Pathology Laboratory, Passage des Déportés, 2, 5030 Gembloux, BelgiumDepartment of Plant Pathology, University of California, Davis, California 95616, USADepartment of Plant Protection, Faculty of Agriculture, University of Sütçü Imam, Kahramanmaras 46060, TurkeyUniv. Bordeaux, INRAE, UMR BFP, CS20032, 33882 Villenave d’Ornon cedex, FranceInstitute for Sustainable Plant Protection, CNR, Via Amendola 122/D, Bari 70126, ItalyPlant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Burg. Van Gansberghelaan 96, 9820 Merelbeke, BelgiumUniv. Bordeaux, INRAE, UMR BFP, CS20032, 33882 Villenave d’Ornon cedex, FranceLeibniz Institute - DSMZ, German Collection of Microorganisms and Cell Cultures GmbH, 38124 Braunschweig, GermanyVirology, Agroscope, Nyon, SwitzerlandDepartment of Evolution and Ecology, University of California, Davis, California 95616, USA; Department of Plant Pathology, University of California, Davis, California 95616, USADepartment of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, SloveniaUniversité de Liège, Terra-Gembloux Agro-Bio Tech, Plant Pathology Laboratory, Passage des Déportés, 2, 5030 Gembloux, BelgiumThe widespread use of High-Throughput Sequencing (HTS) for detection of plant viruses and sequencing of plant virus genomes has led to the generation of large amounts of data and of bioinformatics challenges to process them. Many bioinformatics pipelines for virus detection are available, making the choice of a suitable one difficult. A robust benchmarking is needed for the unbiased comparison of the pipelines, but there is currently a lack of reference datasets that could be used for this purpose. We present 7 semi-artificial datasets composed of real RNA-seq datasets from virus-infected plants spiked with artificial virus reads. Each dataset addresses challenges that could prevent virus detection. We also present 3 real datasets showing a challenging virus composition as well as 8 completely artificial datasets to test haplotype reconstruction software. With these datasets that address several diagnostic challenges, we hope to encourage virologists, diagnosticians and bioinformaticians to evaluate and benchmark their pipeline(s).https://peercommunityjournal.org/articles/10.24072/pcjournal.62/ |
spellingShingle | Tamisier, Lucie Haegeman, Annelies Foucart, Yoika Fouillien, Nicolas Al Rwahnih, Maher Buzkan, Nihal Candresse, Thierry Chiumenti, Michela De Jonghe, Kris Lefebvre, Marie Margaria, Paolo Reynard, Jean Sébastien Stevens, Kristian Kutnjak, Denis Massart, Sébastien Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection Peer Community Journal |
title | Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection |
title_full | Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection |
title_fullStr | Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection |
title_full_unstemmed | Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection |
title_short | Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection |
title_sort | semi artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection |
url | https://peercommunityjournal.org/articles/10.24072/pcjournal.62/ |
work_keys_str_mv | AT tamisierlucie semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection AT haegemanannelies semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection AT foucartyoika semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection AT fouilliennicolas semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection AT alrwahnihmaher semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection AT buzkannihal semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection AT candressethierry semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection AT chiumentimichela semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection AT dejonghekris semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection AT lefebvremarie semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection AT margariapaolo semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection AT reynardjeansebastien semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection AT stevenskristian semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection AT kutnjakdenis semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection AT massartsebastien semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection |