A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.

Genotype imputation estimates the genotypes of unobserved variants using the genotype data of other observed variants based on a collection of haplotypes for thousands of individuals, which is known as a haplotype reference panel. In general, more accurate imputation results were obtained using a la...

Full description

Bibliographic Details
Main Authors: Kaname Kojima, Shu Tadaka, Fumiki Katsuoka, Gen Tamiya, Masayuki Yamamoto, Kengo Kinoshita
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-10-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1008207
_version_ 1818348787224018944
author Kaname Kojima
Shu Tadaka
Fumiki Katsuoka
Gen Tamiya
Masayuki Yamamoto
Kengo Kinoshita
author_facet Kaname Kojima
Shu Tadaka
Fumiki Katsuoka
Gen Tamiya
Masayuki Yamamoto
Kengo Kinoshita
author_sort Kaname Kojima
collection DOAJ
description Genotype imputation estimates the genotypes of unobserved variants using the genotype data of other observed variants based on a collection of haplotypes for thousands of individuals, which is known as a haplotype reference panel. In general, more accurate imputation results were obtained using a larger size of haplotype reference panel. Most of the existing genotype imputation methods explicitly require the haplotype reference panel in precise form, but the accessibility of haplotype data is often limited, due to the requirement of agreements from the donors. Since de-identified information such as summary statistics or model parameters can be used publicly, imputation methods using de-identified haplotype reference information might be useful to enhance the quality of imputation results under the condition where the access of the haplotype data is limited. In this study, we proposed a novel imputation method that handles the reference panel as its model parameters by using bidirectional recurrent neural network (RNN). The model parameters are presented in the form of de-identified information from which the restoration of the genotype data at the individual-level is almost impossible. We demonstrated that the proposed method provides comparable imputation accuracy when compared with the existing imputation methods using haplotype datasets from the 1000 Genomes Project (1KGP) and the Haplotype Reference Consortium. We also considered a scenario where a subset of haplotypes is made available only in de-identified form for the haplotype reference panel. In the evaluation using the 1KGP dataset under the scenario, the imputation accuracy of the proposed method is much higher than that of the existing imputation methods. We therefore conclude that our RNN-based method is quite promising to further promote the data-sharing of sensitive genome data under the recent movement for the protection of individuals' privacy.
first_indexed 2024-12-13T17:55:36Z
format Article
id doaj.art-afac5e76ec784f2a8678fd5b4eb690fc
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-12-13T17:55:36Z
publishDate 2020-10-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-afac5e76ec784f2a8678fd5b4eb690fc2022-12-21T23:36:22ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582020-10-011610e100820710.1371/journal.pcbi.1008207A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.Kaname KojimaShu TadakaFumiki KatsuokaGen TamiyaMasayuki YamamotoKengo KinoshitaGenotype imputation estimates the genotypes of unobserved variants using the genotype data of other observed variants based on a collection of haplotypes for thousands of individuals, which is known as a haplotype reference panel. In general, more accurate imputation results were obtained using a larger size of haplotype reference panel. Most of the existing genotype imputation methods explicitly require the haplotype reference panel in precise form, but the accessibility of haplotype data is often limited, due to the requirement of agreements from the donors. Since de-identified information such as summary statistics or model parameters can be used publicly, imputation methods using de-identified haplotype reference information might be useful to enhance the quality of imputation results under the condition where the access of the haplotype data is limited. In this study, we proposed a novel imputation method that handles the reference panel as its model parameters by using bidirectional recurrent neural network (RNN). The model parameters are presented in the form of de-identified information from which the restoration of the genotype data at the individual-level is almost impossible. We demonstrated that the proposed method provides comparable imputation accuracy when compared with the existing imputation methods using haplotype datasets from the 1000 Genomes Project (1KGP) and the Haplotype Reference Consortium. We also considered a scenario where a subset of haplotypes is made available only in de-identified form for the haplotype reference panel. In the evaluation using the 1KGP dataset under the scenario, the imputation accuracy of the proposed method is much higher than that of the existing imputation methods. We therefore conclude that our RNN-based method is quite promising to further promote the data-sharing of sensitive genome data under the recent movement for the protection of individuals' privacy.https://doi.org/10.1371/journal.pcbi.1008207
spellingShingle Kaname Kojima
Shu Tadaka
Fumiki Katsuoka
Gen Tamiya
Masayuki Yamamoto
Kengo Kinoshita
A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.
PLoS Computational Biology
title A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.
title_full A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.
title_fullStr A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.
title_full_unstemmed A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.
title_short A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.
title_sort genotype imputation method for de identified haplotype reference information by using recurrent neural network
url https://doi.org/10.1371/journal.pcbi.1008207
work_keys_str_mv AT kanamekojima agenotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork
AT shutadaka agenotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork
AT fumikikatsuoka agenotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork
AT gentamiya agenotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork
AT masayukiyamamoto agenotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork
AT kengokinoshita agenotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork
AT kanamekojima genotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork
AT shutadaka genotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork
AT fumikikatsuoka genotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork
AT gentamiya genotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork
AT masayukiyamamoto genotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork
AT kengokinoshita genotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork