Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection

The widespread use of High-Throughput Sequencing (HTS) for detection of plant viruses and sequencing of plant virus genomes has led to the generation of large amounts of data and of bioinformatics challenges to process them. Many bioinformatics pipelines for virus detection are available, making the...

Full description

Bibliographic Details
Main Authors: Tamisier, Lucie, Haegeman, Annelies, Foucart, Yoika, Fouillien, Nicolas, Al Rwahnih, Maher, Buzkan, Nihal, Candresse, Thierry, Chiumenti, Michela, De Jonghe, Kris, Lefebvre, Marie, Margaria, Paolo, Reynard, Jean Sébastien, Stevens, Kristian, Kutnjak, Denis, Massart, Sébastien
Format: Article
Language:English
Published: Peer Community In 2021-12-01
Series:Peer Community Journal
Online Access:https://peercommunityjournal.org/articles/10.24072/pcjournal.62/
_version_ 1797651175263174656
author Tamisier, Lucie
Haegeman, Annelies
Foucart, Yoika
Fouillien, Nicolas
Al Rwahnih, Maher
Buzkan, Nihal
Candresse, Thierry
Chiumenti, Michela
De Jonghe, Kris
Lefebvre, Marie
Margaria, Paolo
Reynard, Jean Sébastien
Stevens, Kristian
Kutnjak, Denis
Massart, Sébastien
author_facet Tamisier, Lucie
Haegeman, Annelies
Foucart, Yoika
Fouillien, Nicolas
Al Rwahnih, Maher
Buzkan, Nihal
Candresse, Thierry
Chiumenti, Michela
De Jonghe, Kris
Lefebvre, Marie
Margaria, Paolo
Reynard, Jean Sébastien
Stevens, Kristian
Kutnjak, Denis
Massart, Sébastien
author_sort Tamisier, Lucie
collection DOAJ
description The widespread use of High-Throughput Sequencing (HTS) for detection of plant viruses and sequencing of plant virus genomes has led to the generation of large amounts of data and of bioinformatics challenges to process them. Many bioinformatics pipelines for virus detection are available, making the choice of a suitable one difficult. A robust benchmarking is needed for the unbiased comparison of the pipelines, but there is currently a lack of reference datasets that could be used for this purpose. We present 7 semi-artificial datasets composed of real RNA-seq datasets from virus-infected plants spiked with artificial virus reads. Each dataset addresses challenges that could prevent virus detection. We also present 3 real datasets showing a challenging virus composition as well as 8 completely artificial datasets to test haplotype reconstruction software. With these datasets that address several diagnostic challenges, we hope to encourage virologists, diagnosticians and bioinformaticians to evaluate and benchmark their pipeline(s).
first_indexed 2024-03-11T16:12:07Z
format Article
id doaj.art-5785239bbcdd41bfb6a61d80d9e92215
institution Directory Open Access Journal
issn 2804-3871
language English
last_indexed 2024-03-11T16:12:07Z
publishDate 2021-12-01
publisher Peer Community In
record_format Article
series Peer Community Journal
spelling doaj.art-5785239bbcdd41bfb6a61d80d9e922152023-10-24T14:38:24ZengPeer Community InPeer Community Journal2804-38712021-12-01110.24072/pcjournal.6210.24072/pcjournal.62Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detectionTamisier, Lucie0https://orcid.org/0000-0002-9231-2997Haegeman, Annelies1https://orcid.org/0000-0002-8192-5368Foucart, Yoika2Fouillien, Nicolas3 Al Rwahnih, Maher4https://orcid.org/0000-0003-1589-9234Buzkan, Nihal5https://orcid.org/0000-0002-0428-0447Candresse, Thierry6Chiumenti, Michela7https://orcid.org/0000-0002-8412-3037De Jonghe, Kris8Lefebvre, Marie9https://orcid.org/0000-0002-3093-5873Margaria, Paolo10https://orcid.org/0000-0001-8670-1331Reynard, Jean Sébastien11https://orcid.org/0000-0002-2337-107XStevens, Kristian12Kutnjak, Denis13https://orcid.org/0000-0002-5327-0587Massart, Sébastien14https://orcid.org/0000-0002-7576-6188Université de Liège, Terra-Gembloux Agro-Bio Tech, Plant Pathology Laboratory, Passage des Déportés, 2, 5030 Gembloux, BelgiumPlant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Burg. Van Gansberghelaan 96, 9820 Merelbeke, BelgiumPlant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Burg. Van Gansberghelaan 96, 9820 Merelbeke, BelgiumUniversité de Liège, Terra-Gembloux Agro-Bio Tech, Plant Pathology Laboratory, Passage des Déportés, 2, 5030 Gembloux, BelgiumDepartment of Plant Pathology, University of California, Davis, California 95616, USADepartment of Plant Protection, Faculty of Agriculture, University of Sütçü Imam, Kahramanmaras 46060, TurkeyUniv. Bordeaux, INRAE, UMR BFP, CS20032, 33882 Villenave d’Ornon cedex, FranceInstitute for Sustainable Plant Protection, CNR, Via Amendola 122/D, Bari 70126, ItalyPlant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Burg. Van Gansberghelaan 96, 9820 Merelbeke, BelgiumUniv. Bordeaux, INRAE, UMR BFP, CS20032, 33882 Villenave d’Ornon cedex, FranceLeibniz Institute - DSMZ, German Collection of Microorganisms and Cell Cultures GmbH, 38124 Braunschweig, GermanyVirology, Agroscope, Nyon, SwitzerlandDepartment of Evolution and Ecology, University of California, Davis, California 95616, USA; Department of Plant Pathology, University of California, Davis, California 95616, USADepartment of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, SloveniaUniversité de Liège, Terra-Gembloux Agro-Bio Tech, Plant Pathology Laboratory, Passage des Déportés, 2, 5030 Gembloux, BelgiumThe widespread use of High-Throughput Sequencing (HTS) for detection of plant viruses and sequencing of plant virus genomes has led to the generation of large amounts of data and of bioinformatics challenges to process them. Many bioinformatics pipelines for virus detection are available, making the choice of a suitable one difficult. A robust benchmarking is needed for the unbiased comparison of the pipelines, but there is currently a lack of reference datasets that could be used for this purpose. We present 7 semi-artificial datasets composed of real RNA-seq datasets from virus-infected plants spiked with artificial virus reads. Each dataset addresses challenges that could prevent virus detection. We also present 3 real datasets showing a challenging virus composition as well as 8 completely artificial datasets to test haplotype reconstruction software. With these datasets that address several diagnostic challenges, we hope to encourage virologists, diagnosticians and bioinformaticians to evaluate and benchmark their pipeline(s).https://peercommunityjournal.org/articles/10.24072/pcjournal.62/
spellingShingle Tamisier, Lucie
Haegeman, Annelies
Foucart, Yoika
Fouillien, Nicolas
Al Rwahnih, Maher
Buzkan, Nihal
Candresse, Thierry
Chiumenti, Michela
De Jonghe, Kris
Lefebvre, Marie
Margaria, Paolo
Reynard, Jean Sébastien
Stevens, Kristian
Kutnjak, Denis
Massart, Sébastien
Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection
Peer Community Journal
title Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection
title_full Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection
title_fullStr Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection
title_full_unstemmed Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection
title_short Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection
title_sort semi artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection
url https://peercommunityjournal.org/articles/10.24072/pcjournal.62/
work_keys_str_mv AT tamisierlucie semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection
AT haegemanannelies semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection
AT foucartyoika semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection
AT fouilliennicolas semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection
AT alrwahnihmaher semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection
AT buzkannihal semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection
AT candressethierry semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection
AT chiumentimichela semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection
AT dejonghekris semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection
AT lefebvremarie semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection
AT margariapaolo semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection
AT reynardjeansebastien semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection
AT stevenskristian semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection
AT kutnjakdenis semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection
AT massartsebastien semiartificialdatasetsasaresourceforvalidationofbioinformaticspipelinesforplantvirusdetection