Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
Abstract Background The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosom...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2018-02-01
|
Series: | BMC Genomics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12864-018-4475-6 |
_version_ | 1817999903347965952 |
---|---|
author | Fedor M. Naumenko Irina I. Abnizova Nathan Beka Mikhail A. Genaev Yuriy L. Orlov |
author_facet | Fedor M. Naumenko Irina I. Abnizova Nathan Beka Mikhail A. Genaev Yuriy L. Orlov |
author_sort | Fedor M. Naumenko |
collection | DOAJ |
description | Abstract Background The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. Results We investigated whether a single chromosome mapping causes any artefacts in the alignments’ performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners’ performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read. Conclusions The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data. |
first_indexed | 2024-04-14T03:15:23Z |
format | Article |
id | doaj.art-f11e02f822d143d1abb7e514b00cc3f8 |
institution | Directory Open Access Journal |
issn | 1471-2164 |
language | English |
last_indexed | 2024-04-14T03:15:23Z |
publishDate | 2018-02-01 |
publisher | BMC |
record_format | Article |
series | BMC Genomics |
spelling | doaj.art-f11e02f822d143d1abb7e514b00cc3f82022-12-22T02:15:29ZengBMCBMC Genomics1471-21642018-02-0119S311712710.1186/s12864-018-4475-6Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosomeFedor M. Naumenko0Irina I. Abnizova1Nathan Beka2Mikhail A. Genaev3Yuriy L. Orlov4Novosibirsk State UniversityWellcome Trust Sanger InstituteUniversity of HertfordshireInstitute of Cytology and Genetics SB RASNovosibirsk State UniversityAbstract Background The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. Results We investigated whether a single chromosome mapping causes any artefacts in the alignments’ performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners’ performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read. Conclusions The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data.http://link.springer.com/article/10.1186/s12864-018-4475-6Next-generation sequencingDNA alignmentRead density distribution |
spellingShingle | Fedor M. Naumenko Irina I. Abnizova Nathan Beka Mikhail A. Genaev Yuriy L. Orlov Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome BMC Genomics Next-generation sequencing DNA alignment Read density distribution |
title | Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome |
title_full | Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome |
title_fullStr | Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome |
title_full_unstemmed | Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome |
title_short | Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome |
title_sort | novel read density distribution score shows possible aligner artefacts when mapping a single chromosome |
topic | Next-generation sequencing DNA alignment Read density distribution |
url | http://link.springer.com/article/10.1186/s12864-018-4475-6 |
work_keys_str_mv | AT fedormnaumenko novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome AT irinaiabnizova novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome AT nathanbeka novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome AT mikhailagenaev novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome AT yuriylorlov novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome |