Towards robust explainability of deep neural networks against attribution attacks

Deep learning techniques have been rapidly developed and widely applied in various fields. However, the black-box nature of deep neural networks (DNNs) makes it difficult to understand their decision-making process, giving rise to the field of explainable artificial intelligence (XAI). Attribution m...

Full description

Bibliographic Details
Main Author:	Wang, Fan
Other Authors:	Kong Wai-Kin, Adams
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Mathematical Sciences Trustworthy machine learning
Online Access:	https://hdl.handle.net/10356/175394

_version_	1811680746609311744
author	Wang, Fan
author2	Kong Wai-Kin, Adams
author_facet	Kong Wai-Kin, Adams Wang, Fan
author_sort	Wang, Fan
collection	NTU
description	Deep learning techniques have been rapidly developed and widely applied in various fields. However, the black-box nature of deep neural networks (DNNs) makes it difficult to understand their decision-making process, giving rise to the field of explainable artificial intelligence (XAI). Attribution methods are one of the most popular XAI methods, aiming to explain the DNN's prediction by attributing it to the input features. Unfortunately, these attribution methods are vulnerable to adversarial attacks, which can mislead the attribution results. To address this problem, this thesis attempts to develop attribution protection methods to defend against adversarial attacks, using both empirical and theoretical approaches. The empirical approaches are developed to improve attribution robustness, and the theoretical approaches are proposed to understand the worst-case attribution deviations after the inputs are perturbed. The effectiveness of the proposed methods is studied with rigorous analysis and proofs, and the performance of the proposed methods is validated on various datasets and different types of attacks, compared with state-of-the-art methods.
first_indexed	2024-10-01T03:29:57Z
format	Thesis-Doctor of Philosophy
id	ntu-10356/175394
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T03:29:57Z
publishDate	2024
publisher	Nanyang Technological University
record_format	dspace
spelling	ntu-10356/1753942024-05-03T02:58:53Z Towards robust explainability of deep neural networks against attribution attacks Wang, Fan Kong Wai-Kin, Adams Nicolas Privault Interdisciplinary Graduate School (IGS) Rapid-Rich Object Search (ROSE) Lab AdamsKong@ntu.edu.sg, NPRIVAULT@ntu.edu.sg Computer and Information Science Mathematical Sciences Trustworthy machine learning Deep learning techniques have been rapidly developed and widely applied in various fields. However, the black-box nature of deep neural networks (DNNs) makes it difficult to understand their decision-making process, giving rise to the field of explainable artificial intelligence (XAI). Attribution methods are one of the most popular XAI methods, aiming to explain the DNN's prediction by attributing it to the input features. Unfortunately, these attribution methods are vulnerable to adversarial attacks, which can mislead the attribution results. To address this problem, this thesis attempts to develop attribution protection methods to defend against adversarial attacks, using both empirical and theoretical approaches. The empirical approaches are developed to improve attribution robustness, and the theoretical approaches are proposed to understand the worst-case attribution deviations after the inputs are perturbed. The effectiveness of the proposed methods is studied with rigorous analysis and proofs, and the performance of the proposed methods is validated on various datasets and different types of attacks, compared with state-of-the-art methods. Doctor of Philosophy 2024-04-22T04:47:07Z 2024-04-22T04:47:07Z 2024 Thesis-Doctor of Philosophy Wang, F. (2024). Towards robust explainability of deep neural networks against attribution attacks. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175394 https://hdl.handle.net/10356/175394 10.32657/10356/175394 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
spellingShingle	Computer and Information Science Mathematical Sciences Trustworthy machine learning Wang, Fan Towards robust explainability of deep neural networks against attribution attacks
title	Towards robust explainability of deep neural networks against attribution attacks
title_full	Towards robust explainability of deep neural networks against attribution attacks
title_fullStr	Towards robust explainability of deep neural networks against attribution attacks
title_full_unstemmed	Towards robust explainability of deep neural networks against attribution attacks
title_short	Towards robust explainability of deep neural networks against attribution attacks
title_sort	towards robust explainability of deep neural networks against attribution attacks
topic	Computer and Information Science Mathematical Sciences Trustworthy machine learning
url	https://hdl.handle.net/10356/175394
work_keys_str_mv	AT wangfan towardsrobustexplainabilityofdeepneuralnetworksagainstattributionattacks

Towards robust explainability of deep neural networks against attribution attacks

Similar Items