SNP allele calling of Illumina Infinium Omni5-4 data using the butterfly method

Abstract We introduce a within-sample SNP calling method, called the “butterfly method”, that improves the quality of SNP calling with the Illumina Infinium Omni5-4 SNP Kit. This was done by improving how no-calls are determined from allele signal intensities. High confidence of SNP allele calling i...

Full description

Bibliographic Details
Main Authors: Mikkel Meyer Andersen, Steffan Noe Christiansen, Jeppe Dyrberg Andersen, Poul Svante Eriksen, Niels Morling
Format: Article
Language:English
Published: Nature Portfolio 2022-10-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-022-22162-8
_version_ 1798029826404122624
author Mikkel Meyer Andersen
Steffan Noe Christiansen
Jeppe Dyrberg Andersen
Poul Svante Eriksen
Niels Morling
author_facet Mikkel Meyer Andersen
Steffan Noe Christiansen
Jeppe Dyrberg Andersen
Poul Svante Eriksen
Niels Morling
author_sort Mikkel Meyer Andersen
collection DOAJ
description Abstract We introduce a within-sample SNP calling method, called the “butterfly method”, that improves the quality of SNP calling with the Illumina Infinium Omni5-4 SNP Kit. This was done by improving how no-calls are determined from allele signal intensities. High confidence of SNP allele calling is extremely important in forensic genetics and clinical diagnostics. This paper is accompanied by two open-source R packages, omni54manifest and snpbeadchip that make SNP calling easy by helping with bookkeeping and giving easy access to meta-information about the SNPs typed with the Illumina Infinium Omni5-4 Kit (including chromosome, probe type, and SNP bases). We compared the results from our method with those obtained with the Illumina GenomeStudio software (which does not provide sample and SNP specific genotype probabilities or other quality measures), and with whole-genome sequencing (WGS). Given the signal intensities, the SNP calling quality was optimised using a threshold for the a posteriori probability of a SNP belonging to a SNP cluster. By lowering the a posteriori probability threshold for no-calls, we obtained a higher call rate than GenomeStudio. Using a higher a posteriori probability threshold, we achieved a higher concordance with the WGS data than GenomeStudio. Our method had SNP call and concordance rates with WGS data of approximately 99%.
first_indexed 2024-04-11T19:30:32Z
format Article
id doaj.art-1fdcf1a8b2c345f6b1286b4e07d40eb0
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-11T19:30:32Z
publishDate 2022-10-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-1fdcf1a8b2c345f6b1286b4e07d40eb02022-12-22T04:06:59ZengNature PortfolioScientific Reports2045-23222022-10-0112111110.1038/s41598-022-22162-8SNP allele calling of Illumina Infinium Omni5-4 data using the butterfly methodMikkel Meyer Andersen0Steffan Noe Christiansen1Jeppe Dyrberg Andersen2Poul Svante Eriksen3Niels Morling4Department of Mathematical Sciences, Aalborg UniversitySection of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of CopenhagenSection of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of CopenhagenDepartment of Mathematical Sciences, Aalborg UniversityDepartment of Mathematical Sciences, Aalborg UniversityAbstract We introduce a within-sample SNP calling method, called the “butterfly method”, that improves the quality of SNP calling with the Illumina Infinium Omni5-4 SNP Kit. This was done by improving how no-calls are determined from allele signal intensities. High confidence of SNP allele calling is extremely important in forensic genetics and clinical diagnostics. This paper is accompanied by two open-source R packages, omni54manifest and snpbeadchip that make SNP calling easy by helping with bookkeeping and giving easy access to meta-information about the SNPs typed with the Illumina Infinium Omni5-4 Kit (including chromosome, probe type, and SNP bases). We compared the results from our method with those obtained with the Illumina GenomeStudio software (which does not provide sample and SNP specific genotype probabilities or other quality measures), and with whole-genome sequencing (WGS). Given the signal intensities, the SNP calling quality was optimised using a threshold for the a posteriori probability of a SNP belonging to a SNP cluster. By lowering the a posteriori probability threshold for no-calls, we obtained a higher call rate than GenomeStudio. Using a higher a posteriori probability threshold, we achieved a higher concordance with the WGS data than GenomeStudio. Our method had SNP call and concordance rates with WGS data of approximately 99%.https://doi.org/10.1038/s41598-022-22162-8
spellingShingle Mikkel Meyer Andersen
Steffan Noe Christiansen
Jeppe Dyrberg Andersen
Poul Svante Eriksen
Niels Morling
SNP allele calling of Illumina Infinium Omni5-4 data using the butterfly method
Scientific Reports
title SNP allele calling of Illumina Infinium Omni5-4 data using the butterfly method
title_full SNP allele calling of Illumina Infinium Omni5-4 data using the butterfly method
title_fullStr SNP allele calling of Illumina Infinium Omni5-4 data using the butterfly method
title_full_unstemmed SNP allele calling of Illumina Infinium Omni5-4 data using the butterfly method
title_short SNP allele calling of Illumina Infinium Omni5-4 data using the butterfly method
title_sort snp allele calling of illumina infinium omni5 4 data using the butterfly method
url https://doi.org/10.1038/s41598-022-22162-8
work_keys_str_mv AT mikkelmeyerandersen snpallelecallingofilluminainfiniumomni54datausingthebutterflymethod
AT steffannoechristiansen snpallelecallingofilluminainfiniumomni54datausingthebutterflymethod
AT jeppedyrbergandersen snpallelecallingofilluminainfiniumomni54datausingthebutterflymethod
AT poulsvanteeriksen snpallelecallingofilluminainfiniumomni54datausingthebutterflymethod
AT nielsmorling snpallelecallingofilluminainfiniumomni54datausingthebutterflymethod