A New Computational Deconvolution Algorithm for the Analysis of Forensic DNA Mixtures with SNP Markers

Single nucleotide polymorphisms (SNPs) support robust analysis on degraded DNA samples. However, the development of a systematic method to interpret the profiles derived from the mixtures is less studied, and it remains a challenge due to the bi-allelic nature of SNP markers. To improve the discrimi...

Full description

Bibliographic Details
Main Authors: Yu Yin, Peng Zhang, Yu Xing
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Genes
Subjects:
Online Access:https://www.mdpi.com/2073-4425/13/5/884
_version_ 1797499593132343296
author Yu Yin
Peng Zhang
Yu Xing
author_facet Yu Yin
Peng Zhang
Yu Xing
author_sort Yu Yin
collection DOAJ
description Single nucleotide polymorphisms (SNPs) support robust analysis on degraded DNA samples. However, the development of a systematic method to interpret the profiles derived from the mixtures is less studied, and it remains a challenge due to the bi-allelic nature of SNP markers. To improve the discriminating power of SNPs, this study explored bioinformatic strategies to analyze mixtures. Then, computer-generated mixtures were produced using real-world massively parallel sequencing (MPS) data from the single samples processed with the Precision ID Identity Panel. Moreover, the values of the frequency of major allele reads (<i>F<sub>MAR</sub></i>) were calculated and applied as key parameters to deconvolve the two-person mixtures and estimate mixture ratios. Four custom R language scripts (three for autosomes and one for Y chromosome) were designed with the K-means clustering method as a core algorithm. Finally, the method was validated with real-world mixtures. The results indicated that the deconvolution accuracy for evenly balanced mixtures was 100% or close to 100%, which was the same as the deconvolution accuracy of inferring the genotypes of the major contributor of unevenly balanced mixtures. Meanwhile, the accuracy of inferring the genotypes of the minor contributor decreased as its proportion in the mixture decreased. Moreover, the estimated mixture ratio was almost equal to the actual ratio between 1:1 and 1:6. The method proposed in this study provides a new paradigm for mixture interpretation, especially for inferring contributor profiles of evenly balanced mixtures and the major contributor profile of unevenly balanced mixtures.
first_indexed 2024-03-10T03:49:37Z
format Article
id doaj.art-c2e26296cf964e85a8a6f9523550b453
institution Directory Open Access Journal
issn 2073-4425
language English
last_indexed 2024-03-10T03:49:37Z
publishDate 2022-05-01
publisher MDPI AG
record_format Article
series Genes
spelling doaj.art-c2e26296cf964e85a8a6f9523550b4532023-11-23T11:11:03ZengMDPI AGGenes2073-44252022-05-0113588410.3390/genes13050884A New Computational Deconvolution Algorithm for the Analysis of Forensic DNA Mixtures with SNP MarkersYu Yin0Peng Zhang1Yu Xing2Department of Forensic Medicine, Chongqing Medical University, #1 Yixueyuan Road, Chongqing 400016, ChinaDepartment of Forensic Medicine, Chongqing Medical University, #1 Yixueyuan Road, Chongqing 400016, ChinaDepartment of Forensic Medicine, Chongqing Medical University, #1 Yixueyuan Road, Chongqing 400016, ChinaSingle nucleotide polymorphisms (SNPs) support robust analysis on degraded DNA samples. However, the development of a systematic method to interpret the profiles derived from the mixtures is less studied, and it remains a challenge due to the bi-allelic nature of SNP markers. To improve the discriminating power of SNPs, this study explored bioinformatic strategies to analyze mixtures. Then, computer-generated mixtures were produced using real-world massively parallel sequencing (MPS) data from the single samples processed with the Precision ID Identity Panel. Moreover, the values of the frequency of major allele reads (<i>F<sub>MAR</sub></i>) were calculated and applied as key parameters to deconvolve the two-person mixtures and estimate mixture ratios. Four custom R language scripts (three for autosomes and one for Y chromosome) were designed with the K-means clustering method as a core algorithm. Finally, the method was validated with real-world mixtures. The results indicated that the deconvolution accuracy for evenly balanced mixtures was 100% or close to 100%, which was the same as the deconvolution accuracy of inferring the genotypes of the major contributor of unevenly balanced mixtures. Meanwhile, the accuracy of inferring the genotypes of the minor contributor decreased as its proportion in the mixture decreased. Moreover, the estimated mixture ratio was almost equal to the actual ratio between 1:1 and 1:6. The method proposed in this study provides a new paradigm for mixture interpretation, especially for inferring contributor profiles of evenly balanced mixtures and the major contributor profile of unevenly balanced mixtures.https://www.mdpi.com/2073-4425/13/5/884forensic geneticsbioinformaticssingle nucleotide polymorphism (SNP)massively parallel sequencing (MPS)Precision ID Identity PanelDNA mixture deconvolution
spellingShingle Yu Yin
Peng Zhang
Yu Xing
A New Computational Deconvolution Algorithm for the Analysis of Forensic DNA Mixtures with SNP Markers
Genes
forensic genetics
bioinformatics
single nucleotide polymorphism (SNP)
massively parallel sequencing (MPS)
Precision ID Identity Panel
DNA mixture deconvolution
title A New Computational Deconvolution Algorithm for the Analysis of Forensic DNA Mixtures with SNP Markers
title_full A New Computational Deconvolution Algorithm for the Analysis of Forensic DNA Mixtures with SNP Markers
title_fullStr A New Computational Deconvolution Algorithm for the Analysis of Forensic DNA Mixtures with SNP Markers
title_full_unstemmed A New Computational Deconvolution Algorithm for the Analysis of Forensic DNA Mixtures with SNP Markers
title_short A New Computational Deconvolution Algorithm for the Analysis of Forensic DNA Mixtures with SNP Markers
title_sort new computational deconvolution algorithm for the analysis of forensic dna mixtures with snp markers
topic forensic genetics
bioinformatics
single nucleotide polymorphism (SNP)
massively parallel sequencing (MPS)
Precision ID Identity Panel
DNA mixture deconvolution
url https://www.mdpi.com/2073-4425/13/5/884
work_keys_str_mv AT yuyin anewcomputationaldeconvolutionalgorithmfortheanalysisofforensicdnamixtureswithsnpmarkers
AT pengzhang anewcomputationaldeconvolutionalgorithmfortheanalysisofforensicdnamixtureswithsnpmarkers
AT yuxing anewcomputationaldeconvolutionalgorithmfortheanalysisofforensicdnamixtureswithsnpmarkers
AT yuyin newcomputationaldeconvolutionalgorithmfortheanalysisofforensicdnamixtureswithsnpmarkers
AT pengzhang newcomputationaldeconvolutionalgorithmfortheanalysisofforensicdnamixtureswithsnpmarkers
AT yuxing newcomputationaldeconvolutionalgorithmfortheanalysisofforensicdnamixtureswithsnpmarkers