Performance of Mapping Approaches for Whole-Genome Bisulfite Sequencing Data in Crop Plants

DNA methylation is involved in many different biological processes in the development and well-being of crop plants such as transposon activation, heterosis, environment-dependent transcriptome plasticity, aging, and many diseases. Whole-genome bisulfite sequencing is an excellent technology for det...

Full description

Bibliographic Details
Main Authors: Claudius Grehl, Marc Wagner, Ioana Lemnian, Bruno Glaser, Ivo Grosse
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-02-01
Series:Frontiers in Plant Science
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fpls.2020.00176/full
_version_ 1811308466935955456
author Claudius Grehl
Claudius Grehl
Marc Wagner
Ioana Lemnian
Ioana Lemnian
Bruno Glaser
Ivo Grosse
Ivo Grosse
author_facet Claudius Grehl
Claudius Grehl
Marc Wagner
Ioana Lemnian
Ioana Lemnian
Bruno Glaser
Ivo Grosse
Ivo Grosse
author_sort Claudius Grehl
collection DOAJ
description DNA methylation is involved in many different biological processes in the development and well-being of crop plants such as transposon activation, heterosis, environment-dependent transcriptome plasticity, aging, and many diseases. Whole-genome bisulfite sequencing is an excellent technology for detecting and quantifying DNA methylation patterns in a wide variety of species, but optimized data analysis pipelines exist only for a small number of species and are missing for many important crop plants. This is especially important as most existing benchmark studies have been performed on mammals with hardly any repetitive elements and without CHG and CHH methylation. Pipelines for the analysis of whole-genome bisulfite sequencing data usually consists of four steps: read trimming, read mapping, quantification of methylation levels, and prediction of differentially methylated regions (DMRs). Here we focus on read mapping, which is challenging because un-methylated cytosines are transformed to uracil during bisulfite treatment and to thymine during the subsequent polymerase chain reaction, and read mappers must be capable of dealing with this cytosine/thymine polymorphism. Several read mappers have been developed over the last years, with different strengths and weaknesses, but their performances have not been critically evaluated. Here, we compare eight read mappers: Bismark, BismarkBwt2, BSMAP, BS-Seeker2, Bwameth, GEM3, Segemehl, and GSNAP to assess the impact of the read-mapping results on the prediction of DMRs. We used simulated data generated from the genomes of Arabidopsis thaliana, Brassica napus, Glycine max, Solanum tuberosum, and Zea mays, monitored the effects of the bisulfite conversion rate, the sequencing error rate, the maximum number of allowed mismatches, as well as the genome structure and size, and calculated precision, number of uniquely mapped reads, distribution of the mapped reads, run time, and memory consumption as features for benchmarking the eight read mappers mentioned above. Furthermore, we validated our findings using real-world data of Glycine max and showed the influence of the mapping step on DMR calling in WGBS pipelines. We found that the conversion rate had only a minor impact on the mapping quality and the number of uniquely mapped reads, whereas the error rate and the maximum number of allowed mismatches had a strong impact and leads to differences of the performance of the eight read mappers. In conclusion, we recommend BSMAP which needs the shortest run time and yields the highest precision, and Bismark which requires the smallest amount of memory and yields precision and high numbers of uniquely mapped reads.
first_indexed 2024-04-13T09:23:41Z
format Article
id doaj.art-a285095ef5784e13b74c24b2e213ae93
institution Directory Open Access Journal
issn 1664-462X
language English
last_indexed 2024-04-13T09:23:41Z
publishDate 2020-02-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Plant Science
spelling doaj.art-a285095ef5784e13b74c24b2e213ae932022-12-22T02:52:31ZengFrontiers Media S.A.Frontiers in Plant Science1664-462X2020-02-011110.3389/fpls.2020.00176504419Performance of Mapping Approaches for Whole-Genome Bisulfite Sequencing Data in Crop PlantsClaudius Grehl0Claudius Grehl1Marc Wagner2Ioana Lemnian3Ioana Lemnian4Bruno Glaser5Ivo Grosse6Ivo Grosse7Institute of Computer Science, Bioinformatics, Martin Luther University Halle–Wittenberg, Von Seckendorff-Platz 1, Halle (Saale), GermanyInstitute of Agronomy and Nutritional Sciences, Soil Biogeochemistry, Martin Luther University Halle–Wittenberg, Von Seckendorff-Platz 3, Halle (Saale), GermanyInstitute of Mathematics and Informatics, Freie Universität Berlin, Berlin, GermanyInstitute of Computer Science, Bioinformatics, Martin Luther University Halle–Wittenberg, Von Seckendorff-Platz 1, Halle (Saale), GermanyInstitute of Human Genetics, Martin Luther University Halle-Wittenberg, Halle (Saale), GermanyInstitute of Agronomy and Nutritional Sciences, Soil Biogeochemistry, Martin Luther University Halle–Wittenberg, Von Seckendorff-Platz 3, Halle (Saale), GermanyInstitute of Computer Science, Bioinformatics, Martin Luther University Halle–Wittenberg, Von Seckendorff-Platz 1, Halle (Saale), GermanyBioinformatics Unit, German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, GermanyDNA methylation is involved in many different biological processes in the development and well-being of crop plants such as transposon activation, heterosis, environment-dependent transcriptome plasticity, aging, and many diseases. Whole-genome bisulfite sequencing is an excellent technology for detecting and quantifying DNA methylation patterns in a wide variety of species, but optimized data analysis pipelines exist only for a small number of species and are missing for many important crop plants. This is especially important as most existing benchmark studies have been performed on mammals with hardly any repetitive elements and without CHG and CHH methylation. Pipelines for the analysis of whole-genome bisulfite sequencing data usually consists of four steps: read trimming, read mapping, quantification of methylation levels, and prediction of differentially methylated regions (DMRs). Here we focus on read mapping, which is challenging because un-methylated cytosines are transformed to uracil during bisulfite treatment and to thymine during the subsequent polymerase chain reaction, and read mappers must be capable of dealing with this cytosine/thymine polymorphism. Several read mappers have been developed over the last years, with different strengths and weaknesses, but their performances have not been critically evaluated. Here, we compare eight read mappers: Bismark, BismarkBwt2, BSMAP, BS-Seeker2, Bwameth, GEM3, Segemehl, and GSNAP to assess the impact of the read-mapping results on the prediction of DMRs. We used simulated data generated from the genomes of Arabidopsis thaliana, Brassica napus, Glycine max, Solanum tuberosum, and Zea mays, monitored the effects of the bisulfite conversion rate, the sequencing error rate, the maximum number of allowed mismatches, as well as the genome structure and size, and calculated precision, number of uniquely mapped reads, distribution of the mapped reads, run time, and memory consumption as features for benchmarking the eight read mappers mentioned above. Furthermore, we validated our findings using real-world data of Glycine max and showed the influence of the mapping step on DMR calling in WGBS pipelines. We found that the conversion rate had only a minor impact on the mapping quality and the number of uniquely mapped reads, whereas the error rate and the maximum number of allowed mismatches had a strong impact and leads to differences of the performance of the eight read mappers. In conclusion, we recommend BSMAP which needs the shortest run time and yields the highest precision, and Bismark which requires the smallest amount of memory and yields precision and high numbers of uniquely mapped reads.https://www.frontiersin.org/article/10.3389/fpls.2020.00176/fullepigeneticsDNA methylation patternsread mappingbenchmarkingWGBS
spellingShingle Claudius Grehl
Claudius Grehl
Marc Wagner
Ioana Lemnian
Ioana Lemnian
Bruno Glaser
Ivo Grosse
Ivo Grosse
Performance of Mapping Approaches for Whole-Genome Bisulfite Sequencing Data in Crop Plants
Frontiers in Plant Science
epigenetics
DNA methylation patterns
read mapping
benchmarking
WGBS
title Performance of Mapping Approaches for Whole-Genome Bisulfite Sequencing Data in Crop Plants
title_full Performance of Mapping Approaches for Whole-Genome Bisulfite Sequencing Data in Crop Plants
title_fullStr Performance of Mapping Approaches for Whole-Genome Bisulfite Sequencing Data in Crop Plants
title_full_unstemmed Performance of Mapping Approaches for Whole-Genome Bisulfite Sequencing Data in Crop Plants
title_short Performance of Mapping Approaches for Whole-Genome Bisulfite Sequencing Data in Crop Plants
title_sort performance of mapping approaches for whole genome bisulfite sequencing data in crop plants
topic epigenetics
DNA methylation patterns
read mapping
benchmarking
WGBS
url https://www.frontiersin.org/article/10.3389/fpls.2020.00176/full
work_keys_str_mv AT claudiusgrehl performanceofmappingapproachesforwholegenomebisulfitesequencingdataincropplants
AT claudiusgrehl performanceofmappingapproachesforwholegenomebisulfitesequencingdataincropplants
AT marcwagner performanceofmappingapproachesforwholegenomebisulfitesequencingdataincropplants
AT ioanalemnian performanceofmappingapproachesforwholegenomebisulfitesequencingdataincropplants
AT ioanalemnian performanceofmappingapproachesforwholegenomebisulfitesequencingdataincropplants
AT brunoglaser performanceofmappingapproachesforwholegenomebisulfitesequencingdataincropplants
AT ivogrosse performanceofmappingapproachesforwholegenomebisulfitesequencingdataincropplants
AT ivogrosse performanceofmappingapproachesforwholegenomebisulfitesequencingdataincropplants