Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study

Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted. Methods for the task are desired to combine <i>effectiveness</i> and <i>efficiency</i> (i.e., they should effectively “unlearn” deleted...

Full description

Bibliographic Details
Main Authors:	Ananth Mahadevan, Michael Mathioudakis
Format:	Article
Language:	English
Published:	MDPI AG 2022-06-01
Series:	Machine Learning and Knowledge Extraction
Subjects:	machine unlearning pipelines logistic regression
Online Access:	https://www.mdpi.com/2504-4990/4/3/28

_version_	1797485466242514944
author	Ananth Mahadevan Michael Mathioudakis
author_facet	Ananth Mahadevan Michael Mathioudakis
author_sort	Ananth Mahadevan
collection	DOAJ
description	Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted. Methods for the task are desired to combine <i>effectiveness</i> and <i>efficiency</i> (i.e., they should effectively “unlearn” deleted data, but in a way that does not require excessive computational effort (e.g., a full retraining) for a small amount of deletions). Such a combination is typically achieved by tolerating some amount of approximation in the unlearning. In addition, laws and regulations in the spirit of “the right to be forgotten” have given rise to requirements for <i>certifiability</i> (i.e., the ability to demonstrate that the deleted data has indeed been unlearned by the ML model). In this paper, we present an experimental study of the three state-of-the-art approximate unlearning methods for logistic regression and demonstrate the trade-offs between efficiency, effectiveness and certifiability offered by each method. In implementing this study, we extend some of the existing works and describe a common unlearning pipeline to compare and evaluate the unlearning methods on six real-world datasets and a variety of settings. We provide insights into the effect of the quantity and distribution of the deleted data on ML models and the performance of each unlearning method in different settings. We also propose a practical online strategy to determine when the accumulated error from approximate unlearning is large enough to warrant a full retraining of the ML model.
first_indexed	2024-03-09T23:20:10Z
format	Article
id	doaj.art-13278c37d40e4df8869831b2dc39845e
institution	Directory Open Access Journal
issn	2504-4990
language	English
last_indexed	2024-03-09T23:20:10Z
publishDate	2022-06-01
publisher	MDPI AG
record_format	Article
series	Machine Learning and Knowledge Extraction
spelling	doaj.art-13278c37d40e4df8869831b2dc39845e2023-11-23T17:28:06ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902022-06-014359162010.3390/make4030028Certifiable Unlearning Pipelines for Logistic Regression: An Experimental StudyAnanth Mahadevan0Michael Mathioudakis1Department of Computer Science, University of Helsinki, 00014 Helsinki, FinlandDepartment of Computer Science, University of Helsinki, 00014 Helsinki, FinlandMachine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted. Methods for the task are desired to combine <i>effectiveness</i> and <i>efficiency</i> (i.e., they should effectively “unlearn” deleted data, but in a way that does not require excessive computational effort (e.g., a full retraining) for a small amount of deletions). Such a combination is typically achieved by tolerating some amount of approximation in the unlearning. In addition, laws and regulations in the spirit of “the right to be forgotten” have given rise to requirements for <i>certifiability</i> (i.e., the ability to demonstrate that the deleted data has indeed been unlearned by the ML model). In this paper, we present an experimental study of the three state-of-the-art approximate unlearning methods for logistic regression and demonstrate the trade-offs between efficiency, effectiveness and certifiability offered by each method. In implementing this study, we extend some of the existing works and describe a common unlearning pipeline to compare and evaluate the unlearning methods on six real-world datasets and a variety of settings. We provide insights into the effect of the quantity and distribution of the deleted data on ML models and the performance of each unlearning method in different settings. We also propose a practical online strategy to determine when the accumulated error from approximate unlearning is large enough to warrant a full retraining of the ML model.https://www.mdpi.com/2504-4990/4/3/28machine unlearningpipelineslogistic regression
spellingShingle	Ananth Mahadevan Michael Mathioudakis Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study Machine Learning and Knowledge Extraction machine unlearning pipelines logistic regression
title	Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study
title_full	Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study
title_fullStr	Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study
title_full_unstemmed	Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study
title_short	Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study
title_sort	certifiable unlearning pipelines for logistic regression an experimental study
topic	machine unlearning pipelines logistic regression
url	https://www.mdpi.com/2504-4990/4/3/28
work_keys_str_mv	AT ananthmahadevan certifiableunlearningpipelinesforlogisticregressionanexperimentalstudy AT michaelmathioudakis certifiableunlearningpipelinesforlogisticregressionanexperimentalstudy

Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study

Similar Items