In-Memory Data Anonymization Using Scalable and High Performance RDD Design
Recent studies in data anonymization techniques have primarily focused on MapReduce. However, these existing MapReduce based approaches often suffer from many performance overheads due to their inappropriate use of data allocation, expensive disk I/O access and network transfer, and no support for i...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-10-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/9/10/1732 |
_version_ | 1797550408529346560 |
---|---|
author | Sibghat Ullah Bazai Julian Jang-Jaccard |
author_facet | Sibghat Ullah Bazai Julian Jang-Jaccard |
author_sort | Sibghat Ullah Bazai |
collection | DOAJ |
description | Recent studies in data anonymization techniques have primarily focused on MapReduce. However, these existing MapReduce based approaches often suffer from many performance overheads due to their inappropriate use of data allocation, expensive disk I/O access and network transfer, and no support for iterative tasks. We propose “SparkDA” which is a new novel anonymization technique that is designed to take the full advantage of Spark platform to generate privacy-preserving anonymized dataset in the most efficient way possible. Our proposal offers a better partition control, in-memory operation and cache management for iterative operations that are heavily utilised for data anonymization processing. Our proposal is based on Spark’s Resilient Distributed Dataset (RDD) with two critical operations of RDD, such as FlatMapRDD and ReduceByKeyRDD, respectively. The experimental results demonstrate that our proposal outperforms the existing approaches in terms of performance and scalability while maintaining high data privacy and utility levels. This illustrates that our proposal is capable to be used in a wider big data applications that demands privacy. |
first_indexed | 2024-03-10T15:28:51Z |
format | Article |
id | doaj.art-ac783ac775484d05a7591ca7f957d004 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-10T15:28:51Z |
publishDate | 2020-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-ac783ac775484d05a7591ca7f957d0042023-11-20T17:48:15ZengMDPI AGElectronics2079-92922020-10-01910173210.3390/electronics9101732In-Memory Data Anonymization Using Scalable and High Performance RDD DesignSibghat Ullah Bazai0Julian Jang-Jaccard1Cybersecurity Lab, Computer Science/Information Technology, Massey University, Auckland 0632, New ZealandCybersecurity Lab, Computer Science/Information Technology, Massey University, Auckland 0632, New ZealandRecent studies in data anonymization techniques have primarily focused on MapReduce. However, these existing MapReduce based approaches often suffer from many performance overheads due to their inappropriate use of data allocation, expensive disk I/O access and network transfer, and no support for iterative tasks. We propose “SparkDA” which is a new novel anonymization technique that is designed to take the full advantage of Spark platform to generate privacy-preserving anonymized dataset in the most efficient way possible. Our proposal offers a better partition control, in-memory operation and cache management for iterative operations that are heavily utilised for data anonymization processing. Our proposal is based on Spark’s Resilient Distributed Dataset (RDD) with two critical operations of RDD, such as FlatMapRDD and ReduceByKeyRDD, respectively. The experimental results demonstrate that our proposal outperforms the existing approaches in terms of performance and scalability while maintaining high data privacy and utility levels. This illustrates that our proposal is capable to be used in a wider big data applications that demands privacy.https://www.mdpi.com/2079-9292/9/10/1732high performancedata anonymizationscalabilitysparkbig data miningprivacy and utility |
spellingShingle | Sibghat Ullah Bazai Julian Jang-Jaccard In-Memory Data Anonymization Using Scalable and High Performance RDD Design Electronics high performance data anonymization scalability spark big data mining privacy and utility |
title | In-Memory Data Anonymization Using Scalable and High Performance RDD Design |
title_full | In-Memory Data Anonymization Using Scalable and High Performance RDD Design |
title_fullStr | In-Memory Data Anonymization Using Scalable and High Performance RDD Design |
title_full_unstemmed | In-Memory Data Anonymization Using Scalable and High Performance RDD Design |
title_short | In-Memory Data Anonymization Using Scalable and High Performance RDD Design |
title_sort | in memory data anonymization using scalable and high performance rdd design |
topic | high performance data anonymization scalability spark big data mining privacy and utility |
url | https://www.mdpi.com/2079-9292/9/10/1732 |
work_keys_str_mv | AT sibghatullahbazai inmemorydataanonymizationusingscalableandhighperformancerdddesign AT julianjangjaccard inmemorydataanonymizationusingscalableandhighperformancerdddesign |