Secure tumor classification by shallow neural network using homomorphic encryption

Abstract Background Disclosure of patients’ genetic information in the process of applying machine learning techniques for tumor classification hinders the privacy of personal information. Homomorphic Encryption (HE), which supports operations between encrypted data, can be used as one of the tools...

Full description

Bibliographic Details
Main Authors: Seungwan Hong, Jai Hyun Park, Wonhee Cho, Hyeongmin Choe, Jung Hee Cheon
Format: Article
Language:English
Published: BMC 2022-04-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-022-08469-w
_version_ 1819210150644285440
author Seungwan Hong
Jai Hyun Park
Wonhee Cho
Hyeongmin Choe
Jung Hee Cheon
author_facet Seungwan Hong
Jai Hyun Park
Wonhee Cho
Hyeongmin Choe
Jung Hee Cheon
author_sort Seungwan Hong
collection DOAJ
description Abstract Background Disclosure of patients’ genetic information in the process of applying machine learning techniques for tumor classification hinders the privacy of personal information. Homomorphic Encryption (HE), which supports operations between encrypted data, can be used as one of the tools to perform such computation without information leakage, but it brings great challenges for directly applying general machine learning algorithms due to the limitations of operations supported by HE. In particular, non-polynomial activation functions, including softmax functions, are difficult to implement with HE and require a suitable approximation method to minimize the loss of accuracy. In the secure genome analysis competition called iDASH 2020, it is presented as a competition task that a multi-label tumor classification method that predicts the class of samples based on genetic information using HE. Methods We develop a secure multi-label tumor classification method using HE to ensure privacy during all the computations of the model inference process. Our solution is based on a 1-layer neural network with the softmax activation function model and uses the approximate HE scheme. We present an approximation method that enables softmax activation in the model using HE and a technique for efficiently encoding data to reduce computational costs. In addition, we propose a HE-friendly data filtering method to reduce the size of large-scale genetic data. Results We aim to analyze the dataset from The Cancer Genome Atlas (TCGA) dataset, which consists of 3,622 samples from 11 types of cancers, genetic features from 25,128 genes. Our preprocessing method reduces the number of genes to 4,096 or less and achieves a microAUC value of 0.9882 (85% accuracy) with a 1-layer shallow neural network. Using our model, we successfully compute the tumor classification inference steps on the encrypted test data in 3.75 minutes. As a result of exceptionally high microAUC values, our solution was awarded co-first place in iDASH 2020 Track 1: “Secure multi-label Tumor classification using Homomorphic Encryption”. Conclusions Our solution is the first result of implementing a neural network model with softmax activation using HE. Also, HE optimization methods presented in this work enable machine learning implementation using HE or other challenging HE applications.
first_indexed 2024-12-23T06:06:36Z
format Article
id doaj.art-c86c4799970f403aa2aba50d41e42fef
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-23T06:06:36Z
publishDate 2022-04-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-c86c4799970f403aa2aba50d41e42fef2022-12-21T17:57:33ZengBMCBMC Genomics1471-21642022-04-0123111910.1186/s12864-022-08469-wSecure tumor classification by shallow neural network using homomorphic encryptionSeungwan Hong0Jai Hyun Park1Wonhee Cho2Hyeongmin Choe3Jung Hee Cheon4Department of Mathematical Sciences, Seoul National UniversityDepartment of Mathematical Sciences, Seoul National UniversityDepartment of Mathematical Sciences, Seoul National UniversityDepartment of Mathematical Sciences, Seoul National UniversityDepartment of Mathematical Sciences, Seoul National UniversityAbstract Background Disclosure of patients’ genetic information in the process of applying machine learning techniques for tumor classification hinders the privacy of personal information. Homomorphic Encryption (HE), which supports operations between encrypted data, can be used as one of the tools to perform such computation without information leakage, but it brings great challenges for directly applying general machine learning algorithms due to the limitations of operations supported by HE. In particular, non-polynomial activation functions, including softmax functions, are difficult to implement with HE and require a suitable approximation method to minimize the loss of accuracy. In the secure genome analysis competition called iDASH 2020, it is presented as a competition task that a multi-label tumor classification method that predicts the class of samples based on genetic information using HE. Methods We develop a secure multi-label tumor classification method using HE to ensure privacy during all the computations of the model inference process. Our solution is based on a 1-layer neural network with the softmax activation function model and uses the approximate HE scheme. We present an approximation method that enables softmax activation in the model using HE and a technique for efficiently encoding data to reduce computational costs. In addition, we propose a HE-friendly data filtering method to reduce the size of large-scale genetic data. Results We aim to analyze the dataset from The Cancer Genome Atlas (TCGA) dataset, which consists of 3,622 samples from 11 types of cancers, genetic features from 25,128 genes. Our preprocessing method reduces the number of genes to 4,096 or less and achieves a microAUC value of 0.9882 (85% accuracy) with a 1-layer shallow neural network. Using our model, we successfully compute the tumor classification inference steps on the encrypted test data in 3.75 minutes. As a result of exceptionally high microAUC values, our solution was awarded co-first place in iDASH 2020 Track 1: “Secure multi-label Tumor classification using Homomorphic Encryption”. Conclusions Our solution is the first result of implementing a neural network model with softmax activation using HE. Also, HE optimization methods presented in this work enable machine learning implementation using HE or other challenging HE applications.https://doi.org/10.1186/s12864-022-08469-wHomomorphic encryptionMulti-label classificationPrivacyNeural networkSoftmax activation
spellingShingle Seungwan Hong
Jai Hyun Park
Wonhee Cho
Hyeongmin Choe
Jung Hee Cheon
Secure tumor classification by shallow neural network using homomorphic encryption
BMC Genomics
Homomorphic encryption
Multi-label classification
Privacy
Neural network
Softmax activation
title Secure tumor classification by shallow neural network using homomorphic encryption
title_full Secure tumor classification by shallow neural network using homomorphic encryption
title_fullStr Secure tumor classification by shallow neural network using homomorphic encryption
title_full_unstemmed Secure tumor classification by shallow neural network using homomorphic encryption
title_short Secure tumor classification by shallow neural network using homomorphic encryption
title_sort secure tumor classification by shallow neural network using homomorphic encryption
topic Homomorphic encryption
Multi-label classification
Privacy
Neural network
Softmax activation
url https://doi.org/10.1186/s12864-022-08469-w
work_keys_str_mv AT seungwanhong securetumorclassificationbyshallowneuralnetworkusinghomomorphicencryption
AT jaihyunpark securetumorclassificationbyshallowneuralnetworkusinghomomorphicencryption
AT wonheecho securetumorclassificationbyshallowneuralnetworkusinghomomorphicencryption
AT hyeongminchoe securetumorclassificationbyshallowneuralnetworkusinghomomorphicencryption
AT jungheecheon securetumorclassificationbyshallowneuralnetworkusinghomomorphicencryption