Incremental <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-Anonymous Microaggregation in Large-Scale Electronic Surveys With Optimized Scheduling

Improvements in technology have led to enormous volumes of detailed personal information made available for any number of statistical studies. This has stimulated the need for anonymization techniques striving to attain a difficult compromise between the usefulness of the data and the protection of...

Full description

Bibliographic Details
Main Authors: David Rebollo-Monedero, Cesar Hernandez-Baigorri, Jordi Forne, Miguel Soriano
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8491270/
_version_ 1818874788378050560
author David Rebollo-Monedero
Cesar Hernandez-Baigorri
Jordi Forne
Miguel Soriano
author_facet David Rebollo-Monedero
Cesar Hernandez-Baigorri
Jordi Forne
Miguel Soriano
author_sort David Rebollo-Monedero
collection DOAJ
description Improvements in technology have led to enormous volumes of detailed personal information made available for any number of statistical studies. This has stimulated the need for anonymization techniques striving to attain a difficult compromise between the usefulness of the data and the protection of our privacy. The k-anonymous microaggregation permits releasing a dataset where each person remains indistinguishable from other k -1 individuals, through the aggregation of demographic attributes, otherwise a potential culprit for respondent reidentification. Although privacy guarantees are by no means absolute, the elegant simplicity of the k-anonymity criterion and the excellent preservation of information utility of microaggregation algorithms has turned them into widely popular approaches whenever data utility is critical. Unfortunately, high-utility algorithms on large datasets inherently require extensive computation. This paper addresses the need of running k-anonymous microaggregation efficiently with mild distortion loss, exploiting the fact that the data may arrive over an extended period of time. Specifically, we propose to split the original dataset into two portions that will be processed subsequently, allowing the first process to start before the entire dataset is received while leveraging the superlinearity of the involved microaggregation algorithms. A detailed mathematical formulation enables us to calculate the optimal time for the fastest anonymization as well as for minimum distortion under a given deadline. Two incremental microaggregation algorithms are devised, for which extensive experimentation is reported. The presented theoretical methodology should prove invaluable in numerous data-collection applications, including large-scale electronic surveys in which computation is possible as the data come in.
first_indexed 2024-12-19T13:16:10Z
format Article
id doaj.art-c24b933268fb47be9fb4ed947c92544d
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-19T13:16:10Z
publishDate 2018-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-c24b933268fb47be9fb4ed947c92544d2022-12-21T20:19:49ZengIEEEIEEE Access2169-35362018-01-016600166004410.1109/ACCESS.2018.28759498491270Incremental <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-Anonymous Microaggregation in Large-Scale Electronic Surveys With Optimized SchedulingDavid Rebollo-Monedero0https://orcid.org/0000-0002-0783-2382Cesar Hernandez-Baigorri1Jordi Forne2Miguel Soriano3Department of Telematic Engineering, Universitat Polit&#x00E8;cnica de Catalunya, Barcelona, SpainDepartment of Telematic Engineering, Universitat Polit&#x00E8;cnica de Catalunya, Barcelona, SpainDepartment of Telematic Engineering, Universitat Polit&#x00E8;cnica de Catalunya, Barcelona, SpainDepartment of Telematic Engineering, Universitat Polit&#x00E8;cnica de Catalunya, Barcelona, SpainImprovements in technology have led to enormous volumes of detailed personal information made available for any number of statistical studies. This has stimulated the need for anonymization techniques striving to attain a difficult compromise between the usefulness of the data and the protection of our privacy. The k-anonymous microaggregation permits releasing a dataset where each person remains indistinguishable from other k -1 individuals, through the aggregation of demographic attributes, otherwise a potential culprit for respondent reidentification. Although privacy guarantees are by no means absolute, the elegant simplicity of the k-anonymity criterion and the excellent preservation of information utility of microaggregation algorithms has turned them into widely popular approaches whenever data utility is critical. Unfortunately, high-utility algorithms on large datasets inherently require extensive computation. This paper addresses the need of running k-anonymous microaggregation efficiently with mild distortion loss, exploiting the fact that the data may arrive over an extended period of time. Specifically, we propose to split the original dataset into two portions that will be processed subsequently, allowing the first process to start before the entire dataset is received while leveraging the superlinearity of the involved microaggregation algorithms. A detailed mathematical formulation enables us to calculate the optimal time for the fastest anonymization as well as for minimum distortion under a given deadline. Two incremental microaggregation algorithms are devised, for which extensive experimentation is reported. The presented theoretical methodology should prove invaluable in numerous data-collection applications, including large-scale electronic surveys in which computation is possible as the data come in.https://ieeexplore.ieee.org/document/8491270/Data privacystatistical disclosure control<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-anonymitymicroaggregationelectronic surveyslarge-scale datasets
spellingShingle David Rebollo-Monedero
Cesar Hernandez-Baigorri
Jordi Forne
Miguel Soriano
Incremental <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-Anonymous Microaggregation in Large-Scale Electronic Surveys With Optimized Scheduling
IEEE Access
Data privacy
statistical disclosure control
<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-anonymity
microaggregation
electronic surveys
large-scale datasets
title Incremental <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-Anonymous Microaggregation in Large-Scale Electronic Surveys With Optimized Scheduling
title_full Incremental <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-Anonymous Microaggregation in Large-Scale Electronic Surveys With Optimized Scheduling
title_fullStr Incremental <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-Anonymous Microaggregation in Large-Scale Electronic Surveys With Optimized Scheduling
title_full_unstemmed Incremental <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-Anonymous Microaggregation in Large-Scale Electronic Surveys With Optimized Scheduling
title_short Incremental <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-Anonymous Microaggregation in Large-Scale Electronic Surveys With Optimized Scheduling
title_sort incremental inline formula tex math notation latex k tex math inline formula anonymous microaggregation in large scale electronic surveys with optimized scheduling
topic Data privacy
statistical disclosure control
<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>-anonymity
microaggregation
electronic surveys
large-scale datasets
url https://ieeexplore.ieee.org/document/8491270/
work_keys_str_mv AT davidrebollomonedero incrementalinlineformulatexmathnotationlatexktexmathinlineformulaanonymousmicroaggregationinlargescaleelectronicsurveyswithoptimizedscheduling
AT cesarhernandezbaigorri incrementalinlineformulatexmathnotationlatexktexmathinlineformulaanonymousmicroaggregationinlargescaleelectronicsurveyswithoptimizedscheduling
AT jordiforne incrementalinlineformulatexmathnotationlatexktexmathinlineformulaanonymousmicroaggregationinlargescaleelectronicsurveyswithoptimizedscheduling
AT miguelsoriano incrementalinlineformulatexmathnotationlatexktexmathinlineformulaanonymousmicroaggregationinlargescaleelectronicsurveyswithoptimizedscheduling