Detecting and Isolating Adversarial Attacks Using Characteristics of the Surrogate Model Framework

The paper introduces a novel framework for detecting adversarial attacks on machine learning models that classify tabular data. Its purpose is to provide a robust method for the monitoring and continuous auditing of machine learning models for the purpose of detecting malicious data alterations. The...

Full description

Bibliographic Details
Main Authors:	Piotr Biczyk, Łukasz Wawrowski
Format:	Article
Language:	English
Published:	MDPI AG 2023-08-01
Series:	Applied Sciences
Subjects:	adversarial attacks explainable artificial intelligence surrogate models diagnostic attributes trustworthy AI
Online Access:	https://www.mdpi.com/2076-3417/13/17/9698

_version_	1797582894392147968
author	Piotr Biczyk Łukasz Wawrowski
author_facet	Piotr Biczyk Łukasz Wawrowski
author_sort	Piotr Biczyk
collection	DOAJ
description	The paper introduces a novel framework for detecting adversarial attacks on machine learning models that classify tabular data. Its purpose is to provide a robust method for the monitoring and continuous auditing of machine learning models for the purpose of detecting malicious data alterations. The core of the framework is based on building machine learning classifiers for the detection of attacks and its type that operate on diagnostic attributes. These diagnostic attributes are obtained not from the original model, but from the surrogate model that has been created by observation of the original model inputs and outputs. The paper presents building blocks for the framework and tests its power for the detection and isolation of attacks in selected scenarios utilizing known attacks and public machine learning data sets. The obtained results pave the road for further experiments and the goal of developing classifiers that can be integrated into real-world scenarios, bolstering the robustness of machine learning applications.
first_indexed	2024-03-10T23:28:00Z
format	Article
id	doaj.art-bca413c6e48540adb5daca1df09c3539
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T23:28:00Z
publishDate	2023-08-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-bca413c6e48540adb5daca1df09c35392023-11-19T07:50:35ZengMDPI AGApplied Sciences2076-34172023-08-011317969810.3390/app13179698Detecting and Isolating Adversarial Attacks Using Characteristics of the Surrogate Model FrameworkPiotr Biczyk0Łukasz Wawrowski1Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, PolandŁukasiewicz Research Network, Institute of Innovative Technologies EMAG, Leopolda 31, 40-189 Katowice, PolandThe paper introduces a novel framework for detecting adversarial attacks on machine learning models that classify tabular data. Its purpose is to provide a robust method for the monitoring and continuous auditing of machine learning models for the purpose of detecting malicious data alterations. The core of the framework is based on building machine learning classifiers for the detection of attacks and its type that operate on diagnostic attributes. These diagnostic attributes are obtained not from the original model, but from the surrogate model that has been created by observation of the original model inputs and outputs. The paper presents building blocks for the framework and tests its power for the detection and isolation of attacks in selected scenarios utilizing known attacks and public machine learning data sets. The obtained results pave the road for further experiments and the goal of developing classifiers that can be integrated into real-world scenarios, bolstering the robustness of machine learning applications.https://www.mdpi.com/2076-3417/13/17/9698adversarial attacksexplainable artificial intelligencesurrogate modelsdiagnostic attributestrustworthy AI
spellingShingle	Piotr Biczyk Łukasz Wawrowski Detecting and Isolating Adversarial Attacks Using Characteristics of the Surrogate Model Framework Applied Sciences adversarial attacks explainable artificial intelligence surrogate models diagnostic attributes trustworthy AI
title	Detecting and Isolating Adversarial Attacks Using Characteristics of the Surrogate Model Framework
title_full	Detecting and Isolating Adversarial Attacks Using Characteristics of the Surrogate Model Framework
title_fullStr	Detecting and Isolating Adversarial Attacks Using Characteristics of the Surrogate Model Framework
title_full_unstemmed	Detecting and Isolating Adversarial Attacks Using Characteristics of the Surrogate Model Framework
title_short	Detecting and Isolating Adversarial Attacks Using Characteristics of the Surrogate Model Framework
title_sort	detecting and isolating adversarial attacks using characteristics of the surrogate model framework
topic	adversarial attacks explainable artificial intelligence surrogate models diagnostic attributes trustworthy AI
url	https://www.mdpi.com/2076-3417/13/17/9698
work_keys_str_mv	AT piotrbiczyk detectingandisolatingadversarialattacksusingcharacteristicsofthesurrogatemodelframework AT łukaszwawrowski detectingandisolatingadversarialattacksusingcharacteristicsofthesurrogatemodelframework

Detecting and Isolating Adversarial Attacks Using Characteristics of the Surrogate Model Framework

Similar Items