Energy-scalable speech recognition circuits

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.

Bibliographic Details
Main Author:	Price, Michael, Ph. D. (Michael R.). Massachusetts Institute of Technology
Other Authors:	Anantha Chandrakasan and James Glass.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2016
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/106090

_version_	1826200389736726528
author	Price, Michael, Ph. D. (Michael R.). Massachusetts Institute of Technology
author2	Anantha Chandrakasan and James Glass.
author_facet	Anantha Chandrakasan and James Glass. Price, Michael, Ph. D. (Michael R.). Massachusetts Institute of Technology
author_sort	Price, Michael, Ph. D. (Michael R.). Massachusetts Institute of Technology
collection	MIT
description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
first_indexed	2024-09-23T11:35:39Z
format	Thesis
id	mit-1721.1/106090
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T11:35:39Z
publishDate	2016
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1060902019-04-12T17:27:40Z Energy-scalable speech recognition circuits Price, Michael, Ph. D. (Michael R.). Massachusetts Institute of Technology Anantha Chandrakasan and James Glass. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. Cataloged from PDF version of thesis. Includes bibliographical references (pages 135-141). As people become more comfortable with speaking to machines, the applications of speech interfaces will diversify and include a wider range of devices, such as wearables, appliances, and robots. Automatic speech recognition (ASR) is a key component of these interfaces that is computationally intensive. This thesis shows how we designed special-purpose integrated circuits to bring local ASR capabilities to electronic devices with a small size and power footprint. This thesis adopts a holistic, system-driven approach to ASR hardware design. We identify external memory bandwidth as the main driver in system power consumption and select algorithms and architectures to minimize it. We evaluate three acoustic modeling approaches-Gaussian mixture models (GMMs), subspace GMMs (SGMMs), and deep neural networks (DNNs)-and identify tradeoffs between memory bandwidth and recognition accuracy. DNNs offer the best tradeoffs for our application; we describe a SIMD DNN architecture using parameter quantization and sparse weight matrices to save bandwidth. We also present a hidden Markov model (HMM) search architecture using a weighted finite-state transducer (WFST) representation. Enhancements to the search architecture, including WFST compression and caching, predictive beam width control, and a word lattice, reduce memory bandwidth to 10 MB/s or less, despite having just 414 kB of on-chip SRAM. The resulting system runs in real-time with accuracy comparable to a software recognizer using the same models. We provide infrastructure for deploying recognizers trained with open-source tools (Kaldi) on the hardware platform. We investigate voice activity detection (VAD) as a wake-up mechanism and conclude that an accurate and robust algorithm is necessary to minimize system power, even if it results in larger area and power for the VAD itself. We design fixed-point digital implementations of three VAD algorithms and explore their performance on two synthetic tasks with SNRs from -5 to 30 dB. The best algorithm uses modulation frequency features with an NN classifier, requiring just 8.9 kB of parameters. Throughout this work we emphasize energy scalability, or the ability to save energy when high accuracy or complex models are not required. Our architecture exploits scalability from many sources: model hyperparameters, runtime parameters such as beam width, and voltage/frequency scaling. We demonstrate these concepts with results from five ASR tasks, with vocabularies ranging from 11 words to 145,000 words. by Michael Price. Ph. D. 2016-12-22T16:28:36Z 2016-12-22T16:28:36Z 2016 2016 Thesis http://hdl.handle.net/1721.1/106090 965382032 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 141 pages application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. Price, Michael, Ph. D. (Michael R.). Massachusetts Institute of Technology Energy-scalable speech recognition circuits
title	Energy-scalable speech recognition circuits
title_full	Energy-scalable speech recognition circuits
title_fullStr	Energy-scalable speech recognition circuits
title_full_unstemmed	Energy-scalable speech recognition circuits
title_short	Energy-scalable speech recognition circuits
title_sort	energy scalable speech recognition circuits
topic	Electrical Engineering and Computer Science.
url	http://hdl.handle.net/1721.1/106090
work_keys_str_mv	AT pricemichaelphdmichaelrmassachusettsinstituteoftechnology energyscalablespeechrecognitioncircuits

Energy-scalable speech recognition circuits

Similar Items