FedFingerprinting: A Federated Learning Approach to Website Fingerprinting Attacks in Tor Networks

Various website fingerprinting attacks (WF) have been developed to detect anonymous users accessing illegal websites in Tor networks by analyzing Tor traffic. These attacks consider several traffic features, such as packet length, number of packets, and time, to identify users who attempt to access...

Full description

Bibliographic Details
Main Authors: Juneseok Bang, Jaewon Jeong, Joohyung Lee
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10194906/
_version_ 1797746108538028032
author Juneseok Bang
Jaewon Jeong
Joohyung Lee
author_facet Juneseok Bang
Jaewon Jeong
Joohyung Lee
author_sort Juneseok Bang
collection DOAJ
description Various website fingerprinting attacks (WF) have been developed to detect anonymous users accessing illegal websites in Tor networks by analyzing Tor traffic. These attacks consider several traffic features, such as packet length, number of packets, and time, to identify users who attempt to access prohibited content. Due to the advance of artificial intelligence (AI) technologies, machine learning or deep learning techniques have been widely adopted for WF to generate an accurate model to break the privacy of illegal users. Nevertheless, such state-of-the-art approaches to WF assumed that entire data from various Tor nodes are collected and trained in a centralized way to generate the model: However, training data sets from Tor nodes may contain sensitive information that the Tor nodes may not want to share. In addition, significant computing and network bottleneck at the centralized server is inevitable in collecting and training various data in a centralized manner. Correspondingly, this paper proposes a novel framework using federated learning (FL) for WF in the Tor network (denoted as FedFingerprinting), enabling Tor nodes to generate the global model collaboratively without exposing their local data sets. Specifically, to alleviate the burden for local training of selected Tor nodes in the FL process, the importance of various handcrafting features used for WF is firstly evaluated through the analysis of the accuracy of features under the ensemble of tree machine learning methods. Then, to balance the accuracy and training time, the combination of selected top-ranked features is trained using FL approaches rather than raw data in the model. Moreover, considering the local model accuracy of each Tor node, effective Tor node selection for the FL process is also designed. Finally, under closed-world settings with the real-world Tor data sets, we empirically demonstrate the comparisons of the proposed FedFingerprinting with raw data and feature selection compared to various benchmarks in terms of the training time and accuracy. Then, the superior performance of the FedFingerprinting with Tor node selection is evaluated in terms of convergence speed.
first_indexed 2024-03-12T15:32:22Z
format Article
id doaj.art-50b2a28ccbd64e8687f2029cf88f9a1e
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-12T15:32:22Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-50b2a28ccbd64e8687f2029cf88f9a1e2023-08-09T23:00:44ZengIEEEIEEE Access2169-35362023-01-0111784317844410.1109/ACCESS.2023.329917410194906FedFingerprinting: A Federated Learning Approach to Website Fingerprinting Attacks in Tor NetworksJuneseok Bang0https://orcid.org/0009-0005-0301-1308Jaewon Jeong1https://orcid.org/0009-0002-2990-185XJoohyung Lee2https://orcid.org/0000-0003-1102-3905Department of Computing, Gachon University, Seongnam, South KoreaDepartment of Computing, Gachon University, Seongnam, South KoreaDepartment of Computing, Gachon University, Seongnam, South KoreaVarious website fingerprinting attacks (WF) have been developed to detect anonymous users accessing illegal websites in Tor networks by analyzing Tor traffic. These attacks consider several traffic features, such as packet length, number of packets, and time, to identify users who attempt to access prohibited content. Due to the advance of artificial intelligence (AI) technologies, machine learning or deep learning techniques have been widely adopted for WF to generate an accurate model to break the privacy of illegal users. Nevertheless, such state-of-the-art approaches to WF assumed that entire data from various Tor nodes are collected and trained in a centralized way to generate the model: However, training data sets from Tor nodes may contain sensitive information that the Tor nodes may not want to share. In addition, significant computing and network bottleneck at the centralized server is inevitable in collecting and training various data in a centralized manner. Correspondingly, this paper proposes a novel framework using federated learning (FL) for WF in the Tor network (denoted as FedFingerprinting), enabling Tor nodes to generate the global model collaboratively without exposing their local data sets. Specifically, to alleviate the burden for local training of selected Tor nodes in the FL process, the importance of various handcrafting features used for WF is firstly evaluated through the analysis of the accuracy of features under the ensemble of tree machine learning methods. Then, to balance the accuracy and training time, the combination of selected top-ranked features is trained using FL approaches rather than raw data in the model. Moreover, considering the local model accuracy of each Tor node, effective Tor node selection for the FL process is also designed. Finally, under closed-world settings with the real-world Tor data sets, we empirically demonstrate the comparisons of the proposed FedFingerprinting with raw data and feature selection compared to various benchmarks in terms of the training time and accuracy. Then, the superior performance of the FedFingerprinting with Tor node selection is evaluated in terms of convergence speed.https://ieeexplore.ieee.org/document/10194906/Tor networkswebsite fingerprinting attacksfederated learningfeature analysisdeep learningmachine learning
spellingShingle Juneseok Bang
Jaewon Jeong
Joohyung Lee
FedFingerprinting: A Federated Learning Approach to Website Fingerprinting Attacks in Tor Networks
IEEE Access
Tor networks
website fingerprinting attacks
federated learning
feature analysis
deep learning
machine learning
title FedFingerprinting: A Federated Learning Approach to Website Fingerprinting Attacks in Tor Networks
title_full FedFingerprinting: A Federated Learning Approach to Website Fingerprinting Attacks in Tor Networks
title_fullStr FedFingerprinting: A Federated Learning Approach to Website Fingerprinting Attacks in Tor Networks
title_full_unstemmed FedFingerprinting: A Federated Learning Approach to Website Fingerprinting Attacks in Tor Networks
title_short FedFingerprinting: A Federated Learning Approach to Website Fingerprinting Attacks in Tor Networks
title_sort fedfingerprinting a federated learning approach to website fingerprinting attacks in tor networks
topic Tor networks
website fingerprinting attacks
federated learning
feature analysis
deep learning
machine learning
url https://ieeexplore.ieee.org/document/10194906/
work_keys_str_mv AT juneseokbang fedfingerprintingafederatedlearningapproachtowebsitefingerprintingattacksintornetworks
AT jaewonjeong fedfingerprintingafederatedlearningapproachtowebsitefingerprintingattacksintornetworks
AT joohyunglee fedfingerprintingafederatedlearningapproachtowebsitefingerprintingattacksintornetworks