A Compression-Based Method for Detecting Anomalies in Textual Data

Nowadays, information and communications technology systems are fundamental assets of our social and economical model, and thus they should be properly protected against the malicious activity of cybercriminals. Defence mechanisms are generally articulated around tools that trace and store informati...

Full description

Bibliographic Details
Main Authors: Gonzalo de la Torre-Abaitua, Luis Fernando Lago-Fernández, David Arroyo
Format: Article
Language:English
Published: MDPI AG 2021-05-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/23/5/618
_version_ 1797533888431521792
author Gonzalo de la Torre-Abaitua
Luis Fernando Lago-Fernández
David Arroyo
author_facet Gonzalo de la Torre-Abaitua
Luis Fernando Lago-Fernández
David Arroyo
author_sort Gonzalo de la Torre-Abaitua
collection DOAJ
description Nowadays, information and communications technology systems are fundamental assets of our social and economical model, and thus they should be properly protected against the malicious activity of cybercriminals. Defence mechanisms are generally articulated around tools that trace and store information in several ways, the simplest one being the generation of plain text files coined as security logs. Such log files are usually inspected, in a semi-automatic way, by security analysts to detect events that may affect system integrity, confidentiality and availability. On this basis, we propose a parameter-free method to detect security incidents from structured text regardless its nature. We use the Normalized Compression Distance to obtain a set of features that can be used by a Support Vector Machine to classify events from a heterogeneous cybersecurity environment. In particular, we explore and validate the application of our method in four different cybersecurity domains: HTTP anomaly identification, spam detection, Domain Generation Algorithms tracking and sentiment analysis. The results obtained show the validity and flexibility of our approach in different security scenarios with a low configuration burden.
first_indexed 2024-03-10T11:22:04Z
format Article
id doaj.art-7f474bae0e374549abdceacb3cd5f65a
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-10T11:22:04Z
publishDate 2021-05-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-7f474bae0e374549abdceacb3cd5f65a2023-11-21T19:58:26ZengMDPI AGEntropy1099-43002021-05-0123561810.3390/e23050618A Compression-Based Method for Detecting Anomalies in Textual DataGonzalo de la Torre-Abaitua0Luis Fernando Lago-Fernández1David Arroyo2Departamento de Ingeniería Informática, Escuela Politécnica Superior, Universidad Autónoma de Madrid, 28049 Madrid, SpainDepartamento de Ingeniería Informática, Escuela Politécnica Superior, Universidad Autónoma de Madrid, 28049 Madrid, SpainInstitute of Physical and Information Technologies (ITEFI), Spanish National Research Council (CSIC), 28006 Madrid, SpainNowadays, information and communications technology systems are fundamental assets of our social and economical model, and thus they should be properly protected against the malicious activity of cybercriminals. Defence mechanisms are generally articulated around tools that trace and store information in several ways, the simplest one being the generation of plain text files coined as security logs. Such log files are usually inspected, in a semi-automatic way, by security analysts to detect events that may affect system integrity, confidentiality and availability. On this basis, we propose a parameter-free method to detect security incidents from structured text regardless its nature. We use the Normalized Compression Distance to obtain a set of features that can be used by a Support Vector Machine to classify events from a heterogeneous cybersecurity environment. In particular, we explore and validate the application of our method in four different cybersecurity domains: HTTP anomaly identification, spam detection, Domain Generation Algorithms tracking and sentiment analysis. The results obtained show the validity and flexibility of our approach in different security scenarios with a low configuration burden.https://www.mdpi.com/1099-4300/23/5/618intrusion detection systemsanomaly detectionnormalized compression distancetext miningdata-driven security
spellingShingle Gonzalo de la Torre-Abaitua
Luis Fernando Lago-Fernández
David Arroyo
A Compression-Based Method for Detecting Anomalies in Textual Data
Entropy
intrusion detection systems
anomaly detection
normalized compression distance
text mining
data-driven security
title A Compression-Based Method for Detecting Anomalies in Textual Data
title_full A Compression-Based Method for Detecting Anomalies in Textual Data
title_fullStr A Compression-Based Method for Detecting Anomalies in Textual Data
title_full_unstemmed A Compression-Based Method for Detecting Anomalies in Textual Data
title_short A Compression-Based Method for Detecting Anomalies in Textual Data
title_sort compression based method for detecting anomalies in textual data
topic intrusion detection systems
anomaly detection
normalized compression distance
text mining
data-driven security
url https://www.mdpi.com/1099-4300/23/5/618
work_keys_str_mv AT gonzalodelatorreabaitua acompressionbasedmethodfordetectinganomaliesintextualdata
AT luisfernandolagofernandez acompressionbasedmethodfordetectinganomaliesintextualdata
AT davidarroyo acompressionbasedmethodfordetectinganomaliesintextualdata
AT gonzalodelatorreabaitua compressionbasedmethodfordetectinganomaliesintextualdata
AT luisfernandolagofernandez compressionbasedmethodfordetectinganomaliesintextualdata
AT davidarroyo compressionbasedmethodfordetectinganomaliesintextualdata