Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking

Linear regression is a basic method that models the relationship between an outcome value and some explanatory values using a linear function. Traditionally, this method is conducted on a clear dataset provided by one data owner. However, in today's ever-increasingly digital world, the data for...

Full description

Bibliographic Details
Main Authors: Guowei Qiu, Xiaolin Gui, Yingliang Zhao
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9110896/
_version_ 1818664356711235584
author Guowei Qiu
Xiaolin Gui
Yingliang Zhao
author_facet Guowei Qiu
Xiaolin Gui
Yingliang Zhao
author_sort Guowei Qiu
collection DOAJ
description Linear regression is a basic method that models the relationship between an outcome value and some explanatory values using a linear function. Traditionally, this method is conducted on a clear dataset provided by one data owner. However, in today's ever-increasingly digital world, the data for regression analysis are likely distributed among multiple parties and even contain sensitive information about the data owners. In this case, data owners are not willing to share their data unless data privacy is guaranteed. In this paper, we propose a novel protocol for conducting privacy-preserving linear regression (PPLR) on horizontally partitioned data. Our system architecture includes multiple clients and two noncolluding servers. In our protocol, each client submits its data in encrypted form to a server, and two servers collaboratively determine the regression model on pooled data without learning its contents. We construct our protocol with Paillier homomorphic encryption and a new data masking technique. This data masking technique can perturb data by multiplying a rational number while the data are encrypted. Due to the use of the data masking technique, the efficiency of our protocol is greatly improved. We provide an error bound of the protocol and prove it rigorously. We also provide security analysis of the protocol. Finally, we implement our system in C++ and Java, and then we evaluate our protocol using real datasets provided by UCI. The experiments show our protocol is one of the most effective approaches to date and has negligible errors compared with performing linear regression on clear data.
first_indexed 2024-12-17T05:31:27Z
format Article
id doaj.art-886c147115514c3b8adcadefa37ce239
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-17T05:31:27Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-886c147115514c3b8adcadefa37ce2392022-12-21T22:01:43ZengIEEEIEEE Access2169-35362020-01-01810760110761310.1109/ACCESS.2020.30007649110896Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data MaskingGuowei Qiu0https://orcid.org/0000-0002-0555-0760Xiaolin Gui1https://orcid.org/0000-0003-4384-9891Yingliang Zhao2https://orcid.org/0000-0001-5699-6001School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, ChinaSchool of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, ChinaSchool of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, ChinaLinear regression is a basic method that models the relationship between an outcome value and some explanatory values using a linear function. Traditionally, this method is conducted on a clear dataset provided by one data owner. However, in today's ever-increasingly digital world, the data for regression analysis are likely distributed among multiple parties and even contain sensitive information about the data owners. In this case, data owners are not willing to share their data unless data privacy is guaranteed. In this paper, we propose a novel protocol for conducting privacy-preserving linear regression (PPLR) on horizontally partitioned data. Our system architecture includes multiple clients and two noncolluding servers. In our protocol, each client submits its data in encrypted form to a server, and two servers collaboratively determine the regression model on pooled data without learning its contents. We construct our protocol with Paillier homomorphic encryption and a new data masking technique. This data masking technique can perturb data by multiplying a rational number while the data are encrypted. Due to the use of the data masking technique, the efficiency of our protocol is greatly improved. We provide an error bound of the protocol and prove it rigorously. We also provide security analysis of the protocol. Finally, we implement our system in C++ and Java, and then we evaluate our protocol using real datasets provided by UCI. The experiments show our protocol is one of the most effective approaches to date and has negligible errors compared with performing linear regression on clear data.https://ieeexplore.ieee.org/document/9110896/Privacy-preserving regressionlinear regressionhomomorphic encryptiondata maskingmultiplicative perturbation
spellingShingle Guowei Qiu
Xiaolin Gui
Yingliang Zhao
Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking
IEEE Access
Privacy-preserving regression
linear regression
homomorphic encryption
data masking
multiplicative perturbation
title Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking
title_full Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking
title_fullStr Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking
title_full_unstemmed Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking
title_short Privacy-Preserving Linear Regression on Distributed Data by Homomorphic Encryption and Data Masking
title_sort privacy preserving linear regression on distributed data by homomorphic encryption and data masking
topic Privacy-preserving regression
linear regression
homomorphic encryption
data masking
multiplicative perturbation
url https://ieeexplore.ieee.org/document/9110896/
work_keys_str_mv AT guoweiqiu privacypreservinglinearregressionondistributeddatabyhomomorphicencryptionanddatamasking
AT xiaolingui privacypreservinglinearregressionondistributeddatabyhomomorphicencryptionanddatamasking
AT yingliangzhao privacypreservinglinearregressionondistributeddatabyhomomorphicencryptionanddatamasking