Classifying Tor Traffic Encrypted Payload Using Machine Learning

Tor, a network offering Internet anonymity, presented both positive and potentially malicious applications, leading to the need for efficient Tor traffic monitoring. While most current traffic classification methods rely on flow-based features, these can be unreliable due to factors like asymmetric...

Full description

Bibliographic Details
Main Authors: Pitpimon Choorod, George Weir, Anil Fernando
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10409147/
_version_ 1797316053054783488
author Pitpimon Choorod
George Weir
Anil Fernando
author_facet Pitpimon Choorod
George Weir
Anil Fernando
author_sort Pitpimon Choorod
collection DOAJ
description Tor, a network offering Internet anonymity, presented both positive and potentially malicious applications, leading to the need for efficient Tor traffic monitoring. While most current traffic classification methods rely on flow-based features, these can be unreliable due to factors like asymmetric routing, and the use of multiple packets for feature computation can lead to processing delays. Recognising the multi-layered encryption of Tor compared to nonTor encrypted payloads, our study explored distinct patterns in their encrypted data. We introduced a novel method using Deep Packet Inspection and machine learning to differentiate between Tor and nonTor traffic based solely on encrypted payload. In the first strand of our research, we investigated hex character analysis of the Tor and nonTor encrypted payloads through statistical testing across 8 groups of application types. Remarkably, our investigation revealed a significant differentiation rate of 94.53% between Tor and nonTor traffic. In the second strand of our research, we aimed to distinguish Tor and nonTor traffic using machine learning, based on encrypted payload features. This proposed feature-based approach proved effective, as evidenced by our classification performance, which attained an average accuracy rate of 95.65% across these 8 groups of applications. Thereby, this study contributes to the efficient classification of Tor and nonTor traffic through features derived solely from a single encrypted payload packet, independent of its position in the traffic flow.
first_indexed 2024-03-08T03:13:44Z
format Article
id doaj.art-60dbc33316f24c3ea83cfa4e25e10838
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-08T03:13:44Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-60dbc33316f24c3ea83cfa4e25e108382024-02-13T00:00:54ZengIEEEIEEE Access2169-35362024-01-0112194181943110.1109/ACCESS.2024.335607310409147Classifying Tor Traffic Encrypted Payload Using Machine LearningPitpimon Choorod0https://orcid.org/0000-0002-9279-0710George Weir1https://orcid.org/0000-0002-6264-4480Anil Fernando2Department of Computer and Information Sciences, University of Strathclyde, Glasgow, U.K.Department of Computer and Information Sciences, University of Strathclyde, Glasgow, U.K.Department of Computer and Information Sciences, University of Strathclyde, Glasgow, U.K.Tor, a network offering Internet anonymity, presented both positive and potentially malicious applications, leading to the need for efficient Tor traffic monitoring. While most current traffic classification methods rely on flow-based features, these can be unreliable due to factors like asymmetric routing, and the use of multiple packets for feature computation can lead to processing delays. Recognising the multi-layered encryption of Tor compared to nonTor encrypted payloads, our study explored distinct patterns in their encrypted data. We introduced a novel method using Deep Packet Inspection and machine learning to differentiate between Tor and nonTor traffic based solely on encrypted payload. In the first strand of our research, we investigated hex character analysis of the Tor and nonTor encrypted payloads through statistical testing across 8 groups of application types. Remarkably, our investigation revealed a significant differentiation rate of 94.53% between Tor and nonTor traffic. In the second strand of our research, we aimed to distinguish Tor and nonTor traffic using machine learning, based on encrypted payload features. This proposed feature-based approach proved effective, as evidenced by our classification performance, which attained an average accuracy rate of 95.65% across these 8 groups of applications. Thereby, this study contributes to the efficient classification of Tor and nonTor traffic through features derived solely from a single encrypted payload packet, independent of its position in the traffic flow.https://ieeexplore.ieee.org/document/10409147/Network traffic classificationTor networkmachine learningencrypted payload featurescharacter analysis
spellingShingle Pitpimon Choorod
George Weir
Anil Fernando
Classifying Tor Traffic Encrypted Payload Using Machine Learning
IEEE Access
Network traffic classification
Tor network
machine learning
encrypted payload features
character analysis
title Classifying Tor Traffic Encrypted Payload Using Machine Learning
title_full Classifying Tor Traffic Encrypted Payload Using Machine Learning
title_fullStr Classifying Tor Traffic Encrypted Payload Using Machine Learning
title_full_unstemmed Classifying Tor Traffic Encrypted Payload Using Machine Learning
title_short Classifying Tor Traffic Encrypted Payload Using Machine Learning
title_sort classifying tor traffic encrypted payload using machine learning
topic Network traffic classification
Tor network
machine learning
encrypted payload features
character analysis
url https://ieeexplore.ieee.org/document/10409147/
work_keys_str_mv AT pitpimonchoorod classifyingtortrafficencryptedpayloadusingmachinelearning
AT georgeweir classifyingtortrafficencryptedpayloadusingmachinelearning
AT anilfernando classifyingtortrafficencryptedpayloadusingmachinelearning