Energy-scalable speech recognition circuits

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.

Bibliographic Details
Main Author: Price, Michael, Ph. D. (Michael R.). Massachusetts Institute of Technology
Other Authors: Anantha Chandrakasan and James Glass.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2016
Subjects:
Online Access:http://hdl.handle.net/1721.1/106090
_version_ 1826200389736726528
author Price, Michael, Ph. D. (Michael R.). Massachusetts Institute of Technology
author2 Anantha Chandrakasan and James Glass.
author_facet Anantha Chandrakasan and James Glass.
Price, Michael, Ph. D. (Michael R.). Massachusetts Institute of Technology
author_sort Price, Michael, Ph. D. (Michael R.). Massachusetts Institute of Technology
collection MIT
description Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
first_indexed 2024-09-23T11:35:39Z
format Thesis
id mit-1721.1/106090
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T11:35:39Z
publishDate 2016
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1060902019-04-12T17:27:40Z Energy-scalable speech recognition circuits Price, Michael, Ph. D. (Michael R.). Massachusetts Institute of Technology Anantha Chandrakasan and James Glass. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. Cataloged from PDF version of thesis. Includes bibliographical references (pages 135-141). As people become more comfortable with speaking to machines, the applications of speech interfaces will diversify and include a wider range of devices, such as wearables, appliances, and robots. Automatic speech recognition (ASR) is a key component of these interfaces that is computationally intensive. This thesis shows how we designed special-purpose integrated circuits to bring local ASR capabilities to electronic devices with a small size and power footprint. This thesis adopts a holistic, system-driven approach to ASR hardware design. We identify external memory bandwidth as the main driver in system power consumption and select algorithms and architectures to minimize it. We evaluate three acoustic modeling approaches-Gaussian mixture models (GMMs), subspace GMMs (SGMMs), and deep neural networks (DNNs)-and identify tradeoffs between memory bandwidth and recognition accuracy. DNNs offer the best tradeoffs for our application; we describe a SIMD DNN architecture using parameter quantization and sparse weight matrices to save bandwidth. We also present a hidden Markov model (HMM) search architecture using a weighted finite-state transducer (WFST) representation. Enhancements to the search architecture, including WFST compression and caching, predictive beam width control, and a word lattice, reduce memory bandwidth to 10 MB/s or less, despite having just 414 kB of on-chip SRAM. The resulting system runs in real-time with accuracy comparable to a software recognizer using the same models. We provide infrastructure for deploying recognizers trained with open-source tools (Kaldi) on the hardware platform. We investigate voice activity detection (VAD) as a wake-up mechanism and conclude that an accurate and robust algorithm is necessary to minimize system power, even if it results in larger area and power for the VAD itself. We design fixed-point digital implementations of three VAD algorithms and explore their performance on two synthetic tasks with SNRs from -5 to 30 dB. The best algorithm uses modulation frequency features with an NN classifier, requiring just 8.9 kB of parameters. Throughout this work we emphasize energy scalability, or the ability to save energy when high accuracy or complex models are not required. Our architecture exploits scalability from many sources: model hyperparameters, runtime parameters such as beam width, and voltage/frequency scaling. We demonstrate these concepts with results from five ASR tasks, with vocabularies ranging from 11 words to 145,000 words. by Michael Price. Ph. D. 2016-12-22T16:28:36Z 2016-12-22T16:28:36Z 2016 2016 Thesis http://hdl.handle.net/1721.1/106090 965382032 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 141 pages application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Price, Michael, Ph. D. (Michael R.). Massachusetts Institute of Technology
Energy-scalable speech recognition circuits
title Energy-scalable speech recognition circuits
title_full Energy-scalable speech recognition circuits
title_fullStr Energy-scalable speech recognition circuits
title_full_unstemmed Energy-scalable speech recognition circuits
title_short Energy-scalable speech recognition circuits
title_sort energy scalable speech recognition circuits
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/106090
work_keys_str_mv AT pricemichaelphdmichaelrmassachusettsinstituteoftechnology energyscalablespeechrecognitioncircuits