Security vulnerabilities in speech recognition systems

The aim of this project is to develop a generic methodology for blackbox testing of speech recognition systems by seeking to identify homophones or similar sounding words which may be misrecognised by a system as command words. A series of experiments are performed with two different speech recognit...

Full description

Bibliographic Details
Main Author: Bispham, M
Format: Report
Published: Centre for Doctoral Training in Cyber Security 2016
Description
Summary:The aim of this project is to develop a generic methodology for blackbox testing of speech recognition systems by seeking to identify homophones or similar sounding words which may be misrecognised by a system as command words. A series of experiments are performed with two different speech recognition systems. The data used in the experiments are a set of words from the vocabulary of a controlled natural language, Attempto Controlled English, together with a set of words not permitted in that controlled language. Several instances are identified where a ‘legal’ Attempto word is misrecognised as an ‘illegal’ word. This demonstrates the feasibility of an attack whereby an attacker might seek to find apparently innocuous words which are misrecognised by a speech-controlled system as commands, thus enabling the attacker covertly to prompt the system to perform an unauthorised action. In such a situation, the attacker would identify a set of command or ‘target’ words used to control a system (the equivalent of the ‘illegal’ words in the experiments with Attempto) and would seek to find a set of ‘adversarial’ words which are misrecognised by the system as a target word. In a real-life attack, an attacker might seek to find words which are misrecognised as command words for a digital assistant such as Siri or Cortana, or as command words for a voice-controlled device in the Internet of Things.