Online Data Center Traffic Classification Based on Inter-Flow Correlations

Today, increasing attention is being paid to Data Center (DC) traffic classification since these infrastructures have become the heart of a variety of time-sensitive and data-intensive service platforms. Classification provides the required tools for better understanding traffic patterns in order to...

Full description

Bibliographic Details
Main Authors: Meriem Amina Si Saber, Mehdi Ghorbani, Abdolkhalegh Bayati, Kim-Khoa Nguyen, Mohamed Cheriet
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9047871/
_version_ 1818854844848406528
author Meriem Amina Si Saber
Mehdi Ghorbani
Abdolkhalegh Bayati
Kim-Khoa Nguyen
Mohamed Cheriet
author_facet Meriem Amina Si Saber
Mehdi Ghorbani
Abdolkhalegh Bayati
Kim-Khoa Nguyen
Mohamed Cheriet
author_sort Meriem Amina Si Saber
collection DOAJ
description Today, increasing attention is being paid to Data Center (DC) traffic classification since these infrastructures have become the heart of a variety of time-sensitive and data-intensive service platforms. Classification provides the required tools for better understanding traffic patterns in order to ensure high Quality of Service (QoS) performances and solve scalability problems. Unfortunately, existing classification algorithms cannot deal efficiently with two critical challenges in DC traffic: inter-class imbalance and critical time constraints. In this paper, we propose a novel correlation-based algorithm following a cost-sensitive approach combined with a Bagged Random Forest (BRF) ensemble algorithm, to address the inter-class imbalance problem while meeting time requirements in a data center environment. In this strategy, a new method based on Reverse k-Nearest Neighbors (RkNN) is proposed to capture the rebalancing weights expressing inter-flow correlations, in order to perform an online classification approach. We demonstrate the efficiency of the algorithm by comparing its performance to several existing methods from data level, algorithm level, and cost-sensitive strategies on four real-world datasets. The results reveal that the proposed algorithm outperforms most approaches in the different datasets in terms of precision, recall, F1 measure, AUC and Kappa, as opposed to other algorithms that result in either high precision with low recall and low precision and high recall causing congestion or resource over provisioning.
first_indexed 2024-12-19T07:59:10Z
format Article
id doaj.art-b3c5840261eb450abb3aea5af3c91682
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-19T07:59:10Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-b3c5840261eb450abb3aea5af3c916822022-12-21T20:29:55ZengIEEEIEEE Access2169-35362020-01-018604016041610.1109/ACCESS.2020.29836059047871Online Data Center Traffic Classification Based on Inter-Flow CorrelationsMeriem Amina Si Saber0https://orcid.org/0000-0002-3826-6758Mehdi Ghorbani1https://orcid.org/0000-0003-4519-8463Abdolkhalegh Bayati2https://orcid.org/0000-0001-7243-4594Kim-Khoa Nguyen3https://orcid.org/0000-0002-9354-7544Mohamed Cheriet4https://orcid.org/0000-0002-5246-7265École de Technologie Supérieure (ÉTS), University of Quebec, Montreal, QC, CanadaÉcole de Technologie Supérieure (ÉTS), University of Quebec, Montreal, QC, CanadaÉcole de Technologie Supérieure (ÉTS), University of Quebec, Montreal, QC, CanadaÉcole de Technologie Supérieure (ÉTS), University of Quebec, Montreal, QC, CanadaÉcole de Technologie Supérieure (ÉTS), University of Quebec, Montreal, QC, CanadaToday, increasing attention is being paid to Data Center (DC) traffic classification since these infrastructures have become the heart of a variety of time-sensitive and data-intensive service platforms. Classification provides the required tools for better understanding traffic patterns in order to ensure high Quality of Service (QoS) performances and solve scalability problems. Unfortunately, existing classification algorithms cannot deal efficiently with two critical challenges in DC traffic: inter-class imbalance and critical time constraints. In this paper, we propose a novel correlation-based algorithm following a cost-sensitive approach combined with a Bagged Random Forest (BRF) ensemble algorithm, to address the inter-class imbalance problem while meeting time requirements in a data center environment. In this strategy, a new method based on Reverse k-Nearest Neighbors (RkNN) is proposed to capture the rebalancing weights expressing inter-flow correlations, in order to perform an online classification approach. We demonstrate the efficiency of the algorithm by comparing its performance to several existing methods from data level, algorithm level, and cost-sensitive strategies on four real-world datasets. The results reveal that the proposed algorithm outperforms most approaches in the different datasets in terms of precision, recall, F1 measure, AUC and Kappa, as opposed to other algorithms that result in either high precision with low recall and low precision and high recall causing congestion or resource over provisioning.https://ieeexplore.ieee.org/document/9047871/Data centernetwork traffic classificationinterflow correlationensemble algorithmsrandom forestdata imbalance
spellingShingle Meriem Amina Si Saber
Mehdi Ghorbani
Abdolkhalegh Bayati
Kim-Khoa Nguyen
Mohamed Cheriet
Online Data Center Traffic Classification Based on Inter-Flow Correlations
IEEE Access
Data center
network traffic classification
interflow correlation
ensemble algorithms
random forest
data imbalance
title Online Data Center Traffic Classification Based on Inter-Flow Correlations
title_full Online Data Center Traffic Classification Based on Inter-Flow Correlations
title_fullStr Online Data Center Traffic Classification Based on Inter-Flow Correlations
title_full_unstemmed Online Data Center Traffic Classification Based on Inter-Flow Correlations
title_short Online Data Center Traffic Classification Based on Inter-Flow Correlations
title_sort online data center traffic classification based on inter flow correlations
topic Data center
network traffic classification
interflow correlation
ensemble algorithms
random forest
data imbalance
url https://ieeexplore.ieee.org/document/9047871/
work_keys_str_mv AT meriemaminasisaber onlinedatacentertrafficclassificationbasedoninterflowcorrelations
AT mehdighorbani onlinedatacentertrafficclassificationbasedoninterflowcorrelations
AT abdolkhaleghbayati onlinedatacentertrafficclassificationbasedoninterflowcorrelations
AT kimkhoanguyen onlinedatacentertrafficclassificationbasedoninterflowcorrelations
AT mohamedcheriet onlinedatacentertrafficclassificationbasedoninterflowcorrelations