Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking
Linear regression is a basic method that models the relationship between an outcome value and some explanatory values using a linear function. Traditionally, this method is conducted on a clear dataset provided by one data owner. However, in today's ever-increasingly digital world, the data for...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9110896/ |
_version_ | 1818664356711235584 |
---|---|
author | Guowei Qiu Xiaolin Gui Yingliang Zhao |
author_facet | Guowei Qiu Xiaolin Gui Yingliang Zhao |
author_sort | Guowei Qiu |
collection | DOAJ |
description | Linear regression is a basic method that models the relationship between an outcome value and some explanatory values using a linear function. Traditionally, this method is conducted on a clear dataset provided by one data owner. However, in today's ever-increasingly digital world, the data for regression analysis are likely distributed among multiple parties and even contain sensitive information about the data owners. In this case, data owners are not willing to share their data unless data privacy is guaranteed. In this paper, we propose a novel protocol for conducting privacy-preserving linear regression (PPLR) on horizontally partitioned data. Our system architecture includes multiple clients and two noncolluding servers. In our protocol, each client submits its data in encrypted form to a server, and two servers collaboratively determine the regression model on pooled data without learning its contents. We construct our protocol with Paillier homomorphic encryption and a new data masking technique. This data masking technique can perturb data by multiplying a rational number while the data are encrypted. Due to the use of the data masking technique, the efficiency of our protocol is greatly improved. We provide an error bound of the protocol and prove it rigorously. We also provide security analysis of the protocol. Finally, we implement our system in C++ and Java, and then we evaluate our protocol using real datasets provided by UCI. The experiments show our protocol is one of the most effective approaches to date and has negligible errors compared with performing linear regression on clear data. |
first_indexed | 2024-12-17T05:31:27Z |
format | Article |
id | doaj.art-886c147115514c3b8adcadefa37ce239 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-17T05:31:27Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-886c147115514c3b8adcadefa37ce2392022-12-21T22:01:43ZengIEEEIEEE Access2169-35362020-01-01810760110761310.1109/ACCESS.2020.30007649110896Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data MaskingGuowei Qiu0https://orcid.org/0000-0002-0555-0760Xiaolin Gui1https://orcid.org/0000-0003-4384-9891Yingliang Zhao2https://orcid.org/0000-0001-5699-6001School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, ChinaSchool of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, ChinaSchool of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, ChinaLinear regression is a basic method that models the relationship between an outcome value and some explanatory values using a linear function. Traditionally, this method is conducted on a clear dataset provided by one data owner. However, in today's ever-increasingly digital world, the data for regression analysis are likely distributed among multiple parties and even contain sensitive information about the data owners. In this case, data owners are not willing to share their data unless data privacy is guaranteed. In this paper, we propose a novel protocol for conducting privacy-preserving linear regression (PPLR) on horizontally partitioned data. Our system architecture includes multiple clients and two noncolluding servers. In our protocol, each client submits its data in encrypted form to a server, and two servers collaboratively determine the regression model on pooled data without learning its contents. We construct our protocol with Paillier homomorphic encryption and a new data masking technique. This data masking technique can perturb data by multiplying a rational number while the data are encrypted. Due to the use of the data masking technique, the efficiency of our protocol is greatly improved. We provide an error bound of the protocol and prove it rigorously. We also provide security analysis of the protocol. Finally, we implement our system in C++ and Java, and then we evaluate our protocol using real datasets provided by UCI. The experiments show our protocol is one of the most effective approaches to date and has negligible errors compared with performing linear regression on clear data.https://ieeexplore.ieee.org/document/9110896/Privacy-preserving regressionlinear regressionhomomorphic encryptiondata maskingmultiplicative perturbation |
spellingShingle | Guowei Qiu Xiaolin Gui Yingliang Zhao Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking IEEE Access Privacy-preserving regression linear regression homomorphic encryption data masking multiplicative perturbation |
title | Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking |
title_full | Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking |
title_fullStr | Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking |
title_full_unstemmed | Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking |
title_short | Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking |
title_sort | privacy preserving linear regression on distributed data by homomorphic encryption and data masking |
topic | Privacy-preserving regression linear regression homomorphic encryption data masking multiplicative perturbation |
url | https://ieeexplore.ieee.org/document/9110896/ |
work_keys_str_mv | AT guoweiqiu privacypreservinglinearregressionondistributeddatabyhomomorphicencryptionanddatamasking AT xiaolingui privacypreservinglinearregressionondistributeddatabyhomomorphicencryptionanddatamasking AT yingliangzhao privacypreservinglinearregressionondistributeddatabyhomomorphicencryptionanddatamasking |