Privacy-Preserving Machine Learning on Apache Spark
The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10314994/ |
_version_ | 1797545216504233984 |
---|---|
author | Claudia V. Brito Pedro G. Ferreira Bernardo L. Portela Rui C. Oliveira Joao T. Paulo |
author_facet | Claudia V. Brito Pedro G. Ferreira Bernardo L. Portela Rui C. Oliveira Joao T. Paulo |
author_sort | Claudia V. Brito |
collection | DOAJ |
description | The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks. |
first_indexed | 2024-03-10T14:12:20Z |
format | Article |
id | doaj.art-0e6c726e85be44e69198d2b860a9bc92 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-10T14:12:20Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-0e6c726e85be44e69198d2b860a9bc922023-11-21T00:01:40ZengIEEEIEEE Access2169-35362023-01-011112790712793010.1109/ACCESS.2023.333222210314994Privacy-Preserving Machine Learning on Apache SparkClaudia V. Brito0https://orcid.org/0000-0003-4293-9887Pedro G. Ferreira1https://orcid.org/0000-0003-3838-8664Bernardo L. Portela2https://orcid.org/0000-0002-7203-2621Rui C. Oliveira3Joao T. Paulo4https://orcid.org/0000-0001-9752-2822INESC TEC, Porto, PortugalINESC TEC, Porto, PortugalINESC TEC, Porto, PortugalINESC TEC, Porto, PortugalINESC TEC, Porto, PortugalThe adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.https://ieeexplore.ieee.org/document/10314994/Privacy-preservingmachine learningdistributed systemsapache sparktrusted execution environmentsIntel SGX |
spellingShingle | Claudia V. Brito Pedro G. Ferreira Bernardo L. Portela Rui C. Oliveira Joao T. Paulo Privacy-Preserving Machine Learning on Apache Spark IEEE Access Privacy-preserving machine learning distributed systems apache spark trusted execution environments Intel SGX |
title | Privacy-Preserving Machine Learning on Apache Spark |
title_full | Privacy-Preserving Machine Learning on Apache Spark |
title_fullStr | Privacy-Preserving Machine Learning on Apache Spark |
title_full_unstemmed | Privacy-Preserving Machine Learning on Apache Spark |
title_short | Privacy-Preserving Machine Learning on Apache Spark |
title_sort | privacy preserving machine learning on apache spark |
topic | Privacy-preserving machine learning distributed systems apache spark trusted execution environments Intel SGX |
url | https://ieeexplore.ieee.org/document/10314994/ |
work_keys_str_mv | AT claudiavbrito privacypreservingmachinelearningonapachespark AT pedrogferreira privacypreservingmachinelearningonapachespark AT bernardolportela privacypreservingmachinelearningonapachespark AT ruicoliveira privacypreservingmachinelearningonapachespark AT joaotpaulo privacypreservingmachinelearningonapachespark |