Identifying IOT-like devices and using Collaborative XAI to understand their cyber security behaviour

<p>Cyber attacking is easier than cyber defending &ndash; attackers only need to find one breach, while the defenders must successfully repel all attacks. This work demonstrates how cyber defenders can increase their capabilities by joining forces with eXplainable-AI (XAI) utilising intera...

Full description

Bibliographic Details
Main Authors: Moyle, S, Martin, A, Allott, N
Format: Report
Language:English
Published: University of Oxford, Department of Computer Science 2023
_version_ 1797110992745791488
author Moyle, S
Martin, A
Allott, N
author_facet Moyle, S
Martin, A
Allott, N
author_sort Moyle, S
collection OXFORD
description <p>Cyber attacking is easier than cyber defending &ndash; attackers only need to find one breach, while the defenders must successfully repel all attacks. This work demonstrates how cyber defenders can increase their capabilities by joining forces with eXplainable-AI (XAI) utilising interactive human-machine collaboration. Effective collaboration requires that both parties can communicate and teach the other party novel concepts. This requires a common, shared, and comprehensible language for both the Human Cyber Defender and the XAI. Here we use the language and power of logic, which was also Alan Turing&rsquo;s preferred choice in his specification for constructing a machine intelligence. Unlike Deep Neural Network approaches, Machine Learning based on logic allows 1) pre-existing Human knowledge to be easily codified for the Machine learner to draw on, and 2) that any new patterns discovered by the machine learner can be communicated back to the Human. Inductive Logic Programming (ILP) is an XAI paradigm that provides powerful machine learning, and that has a track-record of producing patterns that have been published as scientific discoveries in numerous applications. We utilise and extend ILP to be an interactive Machine learner, so that it can collaborate with expert cyber defenders to analyse network security data, and defenders&rsquo; existing knowledge, to produce logical specifications of network attached device behaviour. Two commonly deployed device security approaches are: A) Endpoint or host-based &ndash; where measurements directly from the device itself are utilised in making security decisions; and B) Network Monitoring (NetMon) &ndash; where eavesdropping the communications that a network device makes are used to make security decisions. Both techniques have advantages and disadvantages. We take the pragmatic approach of using network monitoring as it is least disruptive to an existing network of operational devices and non endpoint-supported IoT devices. The primary data source is network flow meta-data captured on a live business network with up to approximately 180 devices attached, which has been gathered over a period of almost six months. A Turing Machine permits a countably infinite range of behaviour, and this allows any particular device to be observed performing potentially a very large number of communications. From a cyber defender&rsquo;s perspective, one is interested as to which of the devices have more volatile behaviours (and are hence more difficult to defend), and which devices are less volatile (and potentially easier to defend). This work proposes a volatility metric based on the (simple) Good-Turing Frequency Estimator (SGT), and applies it to the data collected from the live business network. The average SGT metric shows variations from lowest to highest of 28 orders of magnitude between devices in the observed period. It also identifies devices that are consistently low volatility. It is conjectured that devices with low SGT volatility are easier to defend. The rapidly growing population of deployed IoT devices are an increasing challenge for cyber defenders. The security posture of IoT devices tends to be poor, often due to limited compute power of the devices, combined with the immaturity of the IoT security engineering. There is a real need to identify, segregate, and defend IoT devices, so that they are not an easy target for network compromise. Fortunately, IoT devices typically perform simple operations repeatedly, thus we hypothesise that IoT devices have a lower volatility than many other devices (e.g. a human&rsquo;s workstation). Empirically, this is supported in the analysis of the network flow meta-data from the operational business network. First we generalise the network flow meta-data from a single day of an IoT-like device (i.e with consistently low SGT volatility) using the SEQUITUR algorithm to produce a form of grammar. The SEQUITUR grammars are displayed using a novel visualisation that allows a Human defender&nbsp;to identify particular sequences of behaviour that recur for the device. Next, a specification of the recurring behaviour of devices with low SGT-volatility is logically reverse-engineered using the XAI Human-Machine collaboration system Acuity (based on the popular ILP system Aleph). Acuity combines i) network flow data, ii) induced sequitur device network flow grammar, and iii) Human cyber defender wisdom &ndash; all encoded in the logic programming language Prolog. This reverse engineering is a two-way collaborative process whereby the human cyber defender guides the machine learner to utilise the data. The machine learner proposes the best hypothesis (specification of device behaviour) that it finds, while the Human cyber defender updates the learner with information that the cyber defender knows, or suspects &ndash; much of it contained within their expertise and experience. Once an hypothesis is agreed on, it can be tested against more data from the same device (e.g. subsequent days), and then tested on devices of a similar nature. Ultimately, with a satisfactory specification of a particular class of device, it can be deployed as a defensive white-list within a&nbsp;network router&rsquo;s firewall. Each device that is secured by such a system is one less device for the cyber defender to worry (so much) about. Cyber defenders are a global scarce resource. Each device that we adequately secure makes for less work for the cyber defenders. This work defines a novel metric for the volatility of network communications between devices, identifies devices&nbsp;that are low volatility, applies a hierarchical grammar building technique to identify sequences of network behaviour for individual devices, and uses XAI,&nbsp;in a logical framework, to reverse-engineer specifications of behaviour that can be deployed as part of a defensive system. This work empirically demonstrates that by amplifying the skills of cyber defenders with an explainable AI Human-Machine learning system, it can improve the understanding and the security of&nbsp;the behaviour of network IoT devices.</p>
first_indexed 2024-03-07T08:02:29Z
format Report
id oxford-uuid:ae1353c9-97c8-4356-9f17-73c2633a740f
institution University of Oxford
language English
last_indexed 2024-03-07T08:02:29Z
publishDate 2023
publisher University of Oxford, Department of Computer Science
record_format dspace
spelling oxford-uuid:ae1353c9-97c8-4356-9f17-73c2633a740f2023-10-11T11:05:28ZIdentifying IOT-like devices and using Collaborative XAI to understand their cyber security behaviourReporthttp://purl.org/coar/resource_type/c_93fcuuid:ae1353c9-97c8-4356-9f17-73c2633a740fEnglishSymplectic ElementsUniversity of Oxford, Department of Computer Science2023Moyle, SMartin, AAllott, N<p>Cyber attacking is easier than cyber defending &ndash; attackers only need to find one breach, while the defenders must successfully repel all attacks. This work demonstrates how cyber defenders can increase their capabilities by joining forces with eXplainable-AI (XAI) utilising interactive human-machine collaboration. Effective collaboration requires that both parties can communicate and teach the other party novel concepts. This requires a common, shared, and comprehensible language for both the Human Cyber Defender and the XAI. Here we use the language and power of logic, which was also Alan Turing&rsquo;s preferred choice in his specification for constructing a machine intelligence. Unlike Deep Neural Network approaches, Machine Learning based on logic allows 1) pre-existing Human knowledge to be easily codified for the Machine learner to draw on, and 2) that any new patterns discovered by the machine learner can be communicated back to the Human. Inductive Logic Programming (ILP) is an XAI paradigm that provides powerful machine learning, and that has a track-record of producing patterns that have been published as scientific discoveries in numerous applications. We utilise and extend ILP to be an interactive Machine learner, so that it can collaborate with expert cyber defenders to analyse network security data, and defenders&rsquo; existing knowledge, to produce logical specifications of network attached device behaviour. Two commonly deployed device security approaches are: A) Endpoint or host-based &ndash; where measurements directly from the device itself are utilised in making security decisions; and B) Network Monitoring (NetMon) &ndash; where eavesdropping the communications that a network device makes are used to make security decisions. Both techniques have advantages and disadvantages. We take the pragmatic approach of using network monitoring as it is least disruptive to an existing network of operational devices and non endpoint-supported IoT devices. The primary data source is network flow meta-data captured on a live business network with up to approximately 180 devices attached, which has been gathered over a period of almost six months. A Turing Machine permits a countably infinite range of behaviour, and this allows any particular device to be observed performing potentially a very large number of communications. From a cyber defender&rsquo;s perspective, one is interested as to which of the devices have more volatile behaviours (and are hence more difficult to defend), and which devices are less volatile (and potentially easier to defend). This work proposes a volatility metric based on the (simple) Good-Turing Frequency Estimator (SGT), and applies it to the data collected from the live business network. The average SGT metric shows variations from lowest to highest of 28 orders of magnitude between devices in the observed period. It also identifies devices that are consistently low volatility. It is conjectured that devices with low SGT volatility are easier to defend. The rapidly growing population of deployed IoT devices are an increasing challenge for cyber defenders. The security posture of IoT devices tends to be poor, often due to limited compute power of the devices, combined with the immaturity of the IoT security engineering. There is a real need to identify, segregate, and defend IoT devices, so that they are not an easy target for network compromise. Fortunately, IoT devices typically perform simple operations repeatedly, thus we hypothesise that IoT devices have a lower volatility than many other devices (e.g. a human&rsquo;s workstation). Empirically, this is supported in the analysis of the network flow meta-data from the operational business network. First we generalise the network flow meta-data from a single day of an IoT-like device (i.e with consistently low SGT volatility) using the SEQUITUR algorithm to produce a form of grammar. The SEQUITUR grammars are displayed using a novel visualisation that allows a Human defender&nbsp;to identify particular sequences of behaviour that recur for the device. Next, a specification of the recurring behaviour of devices with low SGT-volatility is logically reverse-engineered using the XAI Human-Machine collaboration system Acuity (based on the popular ILP system Aleph). Acuity combines i) network flow data, ii) induced sequitur device network flow grammar, and iii) Human cyber defender wisdom &ndash; all encoded in the logic programming language Prolog. This reverse engineering is a two-way collaborative process whereby the human cyber defender guides the machine learner to utilise the data. The machine learner proposes the best hypothesis (specification of device behaviour) that it finds, while the Human cyber defender updates the learner with information that the cyber defender knows, or suspects &ndash; much of it contained within their expertise and experience. Once an hypothesis is agreed on, it can be tested against more data from the same device (e.g. subsequent days), and then tested on devices of a similar nature. Ultimately, with a satisfactory specification of a particular class of device, it can be deployed as a defensive white-list within a&nbsp;network router&rsquo;s firewall. Each device that is secured by such a system is one less device for the cyber defender to worry (so much) about. Cyber defenders are a global scarce resource. Each device that we adequately secure makes for less work for the cyber defenders. This work defines a novel metric for the volatility of network communications between devices, identifies devices&nbsp;that are low volatility, applies a hierarchical grammar building technique to identify sequences of network behaviour for individual devices, and uses XAI,&nbsp;in a logical framework, to reverse-engineer specifications of behaviour that can be deployed as part of a defensive system. This work empirically demonstrates that by amplifying the skills of cyber defenders with an explainable AI Human-Machine learning system, it can improve the understanding and the security of&nbsp;the behaviour of network IoT devices.</p>
spellingShingle Moyle, S
Martin, A
Allott, N
Identifying IOT-like devices and using Collaborative XAI to understand their cyber security behaviour
title Identifying IOT-like devices and using Collaborative XAI to understand their cyber security behaviour
title_full Identifying IOT-like devices and using Collaborative XAI to understand their cyber security behaviour
title_fullStr Identifying IOT-like devices and using Collaborative XAI to understand their cyber security behaviour
title_full_unstemmed Identifying IOT-like devices and using Collaborative XAI to understand their cyber security behaviour
title_short Identifying IOT-like devices and using Collaborative XAI to understand their cyber security behaviour
title_sort identifying iot like devices and using collaborative xai to understand their cyber security behaviour
work_keys_str_mv AT moyles identifyingiotlikedevicesandusingcollaborativexaitounderstandtheircybersecuritybehaviour
AT martina identifyingiotlikedevicesandusingcollaborativexaitounderstandtheircybersecuritybehaviour
AT allottn identifyingiotlikedevicesandusingcollaborativexaitounderstandtheircybersecuritybehaviour