Summary: | Background: Genomic sequencing of SNPs is increasingly prevalent, though the amount of familial
information these data contain has not been quantified.
Methods: We provide a framework for measuring the risk to siblings of a patient's SNP genotype
disclosure, and demonstrate that sibling SNP genotypes can be inferred with substantial accuracy.
Results: Extending this inference technique, we determine that a very low number of matches at
commonly varying SNPs is sufficient to confirm sib-ship, demonstrating that published sequence
data can reliably be used to derive sibling identities. Using HapMap trio data, at SNPs where one
child is homozygotic major, with a minor allele frequency ≤ 0.20, (N = 452684, 65.1%) we achieve
91.9% inference accuracy for sibling genotypes.
Conclusion: These findings demonstrate that substantial discrimination and privacy risks arise
from use of inferred familial genomic data.
|