Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome

Abstract Background The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosom...

Full description

Bibliographic Details
Main Authors: Fedor M. Naumenko, Irina I. Abnizova, Nathan Beka, Mikhail A. Genaev, Yuriy L. Orlov
Format: Article
Language:English
Published: BMC 2018-02-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-018-4475-6
_version_ 1817999903347965952
author Fedor M. Naumenko
Irina I. Abnizova
Nathan Beka
Mikhail A. Genaev
Yuriy L. Orlov
author_facet Fedor M. Naumenko
Irina I. Abnizova
Nathan Beka
Mikhail A. Genaev
Yuriy L. Orlov
author_sort Fedor M. Naumenko
collection DOAJ
description Abstract Background The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. Results We investigated whether a single chromosome mapping causes any artefacts in the alignments’ performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners’ performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read. Conclusions The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data.
first_indexed 2024-04-14T03:15:23Z
format Article
id doaj.art-f11e02f822d143d1abb7e514b00cc3f8
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-04-14T03:15:23Z
publishDate 2018-02-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-f11e02f822d143d1abb7e514b00cc3f82022-12-22T02:15:29ZengBMCBMC Genomics1471-21642018-02-0119S311712710.1186/s12864-018-4475-6Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosomeFedor M. Naumenko0Irina I. Abnizova1Nathan Beka2Mikhail A. Genaev3Yuriy L. Orlov4Novosibirsk State UniversityWellcome Trust Sanger InstituteUniversity of HertfordshireInstitute of Cytology and Genetics SB RASNovosibirsk State UniversityAbstract Background The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. Results We investigated whether a single chromosome mapping causes any artefacts in the alignments’ performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners’ performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read. Conclusions The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data.http://link.springer.com/article/10.1186/s12864-018-4475-6Next-generation sequencingDNA alignmentRead density distribution
spellingShingle Fedor M. Naumenko
Irina I. Abnizova
Nathan Beka
Mikhail A. Genaev
Yuriy L. Orlov
Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
BMC Genomics
Next-generation sequencing
DNA alignment
Read density distribution
title Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
title_full Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
title_fullStr Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
title_full_unstemmed Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
title_short Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
title_sort novel read density distribution score shows possible aligner artefacts when mapping a single chromosome
topic Next-generation sequencing
DNA alignment
Read density distribution
url http://link.springer.com/article/10.1186/s12864-018-4475-6
work_keys_str_mv AT fedormnaumenko novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome
AT irinaiabnizova novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome
AT nathanbeka novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome
AT mikhailagenaev novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome
AT yuriylorlov novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome