Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis

The accurate estimation of nucleotide variability using next-generation sequencing data is challenged by the high number of sequencing errors produced by new sequencing technologies, especially for nonmodel species, where reference sequences may not be available and the read depth may be low due to...

Full description

Bibliographic Details
Main Authors:	Javier Navarro, Bruno Nevado, Porfidio Hernández, Gonzalo Vera, Sebastián E Ramos-Onsins
Format:	Article
Language:	English
Published:	SAGE Publishing 2017-08-01
Series:	Evolutionary Bioinformatics
Online Access:	https://doi.org/10.1177/1176934317723884

_version_	1818313033711091712
author	Javier Navarro Bruno Nevado Porfidio Hernández Gonzalo Vera Sebastián E Ramos-Onsins
author_facet	Javier Navarro Bruno Nevado Porfidio Hernández Gonzalo Vera Sebastián E Ramos-Onsins
author_sort	Javier Navarro
collection	DOAJ
description	The accurate estimation of nucleotide variability using next-generation sequencing data is challenged by the high number of sequencing errors produced by new sequencing technologies, especially for nonmodel species, where reference sequences may not be available and the read depth may be low due to limited budgets. The most popular single-nucleotide polymorphism (SNP) callers are designed to obtain a high SNP recovery and low false discovery rate but are not designed to account appropriately the frequency of the variants. Instead, algorithms designed to account for the frequency of SNPs give precise results for estimating the levels and the patterns of variability. These algorithms are focused on the unbiased estimation of the variability and not on the high recovery of SNPs. Here, we implemented a fast and optimized parallel algorithm that includes the method developed by Roesti et al and Lynch, which estimates the genotype of each individual at each site, considering the possibility to call both bases from the genotype, a single one or none. This algorithm does not consider the reference and therefore is independent of biases related to the reference nucleotide specified. The pipeline starts from a BAM file converted to pileup or mpileup format and the software outputs a FASTA file. The new program not only reduces the running times but also, given the improved use of resources, it allows its usage with smaller computers and large parallel computers, expanding its benefits to a wider range of researchers. The output file can be analyzed using software for population genetics analysis, such as the R library PopGenome, the software VariScan, and the program mstatspop for analysis considering positions with missing data.
first_indexed	2024-12-13T08:27:19Z
format	Article
id	doaj.art-eb36cff05ea644a2b5f3d2fd5d5a252c
institution	Directory Open Access Journal
issn	1176-9343
language	English
last_indexed	2024-12-13T08:27:19Z
publishDate	2017-08-01
publisher	SAGE Publishing
record_format	Article
series	Evolutionary Bioinformatics
spelling	doaj.art-eb36cff05ea644a2b5f3d2fd5d5a252c2022-12-21T23:53:52ZengSAGE PublishingEvolutionary Bioinformatics1176-93432017-08-011310.1177/1176934317723884Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability AnalysisJavier Navarro0Bruno Nevado1Porfidio Hernández2Gonzalo Vera3Sebastián E Ramos-Onsins4Computer Architecture and Operating Systems Department, Universitat Autònoma de Barcelona, Barcelona, SpainDepartment of Plant Sciences, University of Oxford, Oxford, UKComputer Architecture and Operating Systems Department, Universitat Autònoma de Barcelona, Barcelona, SpainCentre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Barcelona, SpainCentre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Barcelona, SpainThe accurate estimation of nucleotide variability using next-generation sequencing data is challenged by the high number of sequencing errors produced by new sequencing technologies, especially for nonmodel species, where reference sequences may not be available and the read depth may be low due to limited budgets. The most popular single-nucleotide polymorphism (SNP) callers are designed to obtain a high SNP recovery and low false discovery rate but are not designed to account appropriately the frequency of the variants. Instead, algorithms designed to account for the frequency of SNPs give precise results for estimating the levels and the patterns of variability. These algorithms are focused on the unbiased estimation of the variability and not on the high recovery of SNPs. Here, we implemented a fast and optimized parallel algorithm that includes the method developed by Roesti et al and Lynch, which estimates the genotype of each individual at each site, considering the possibility to call both bases from the genotype, a single one or none. This algorithm does not consider the reference and therefore is independent of biases related to the reference nucleotide specified. The pipeline starts from a BAM file converted to pileup or mpileup format and the software outputs a FASTA file. The new program not only reduces the running times but also, given the improved use of resources, it allows its usage with smaller computers and large parallel computers, expanding its benefits to a wider range of researchers. The output file can be analyzed using software for population genetics analysis, such as the R library PopGenome, the software VariScan, and the program mstatspop for analysis considering positions with missing data.https://doi.org/10.1177/1176934317723884
spellingShingle	Javier Navarro Bruno Nevado Porfidio Hernández Gonzalo Vera Sebastián E Ramos-Onsins Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis Evolutionary Bioinformatics
title	Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis
title_full	Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis
title_fullStr	Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis
title_full_unstemmed	Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis
title_short	Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis
title_sort	optimized next generation sequencing genotype haplotype calling for genome variability analysis
url	https://doi.org/10.1177/1176934317723884
work_keys_str_mv	AT javiernavarro optimizednextgenerationsequencinggenotypehaplotypecallingforgenomevariabilityanalysis AT brunonevado optimizednextgenerationsequencinggenotypehaplotypecallingforgenomevariabilityanalysis AT porfidiohernandez optimizednextgenerationsequencinggenotypehaplotypecallingforgenomevariabilityanalysis AT gonzalovera optimizednextgenerationsequencinggenotypehaplotypecallingforgenomevariabilityanalysis AT sebastianeramosonsins optimizednextgenerationsequencinggenotypehaplotypecallingforgenomevariabilityanalysis

Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis

Similar Items