A hybrid heuristic-statistical peer-to-peer traffic classifier

Peer-to-Peer (P2P) traffic classification is still an open research problem due to the challenges to provide an optimum classifier. In this work, a novel hybrid heuristic and statistical approach to classify P2P traffic is proposed. Heuristics approach provides high accuracy. However, it involves ma...

Full description

Bibliographic Details
Main Author: Hassan Hamid, Mussab Mustafa
Format: Thesis
Published: 2010
Subjects:
Description
Summary:Peer-to-Peer (P2P) traffic classification is still an open research problem due to the challenges to provide an optimum classifier. In this work, a novel hybrid heuristic and statistical approach to classify P2P traffic is proposed. Heuristics approach provides high accuracy. However, it involves many correlation between packets and flows within certain time which make it inapplicable for online classification. On the other hand, statistical classification can classify traffic in an online manner but it needs periodical manual retraining. In the proposed solution, heuristic and statistical classification are combined to overcome their weaknesses. The system involves two modules: offline learning and online statistical classification. In the first module, heuristics are used to classify traces flows into three classes, two which are used for training the online statistical classifier. In the online module, machine learning (ML) algorithms are used to classify traffic on the fly. This work presents an enhancement for existing heuristic classification technique by adding a new class. Using 22 traffic traces downloaded from different shared resources and captured from Universiti Teknologi Malaysia (UTM) campus network between March and June 2010, the proposed system is evaluated. In offline phase (heuristics), the result shows that adding the third class improves the accuracy from 93% to 98%. This module could provide quality examples to be used to train the online statistical classifier. For the online statistical classifier, 64 ML algorithms are investigated. Deep analyses on ML algorithms shows that Decision Tree algorithms provide the best result on both accuracy and processing time. Using examples generated by the heuristic classifiers, the overall statistical classification accuracy is 99% based on analysis on downloaded and captured UTM traces.