Summary: | With a large number of sensors getting connected to the internet, scalability of
Internet of Things (IoT) has started to hinge on Edge computing-the ability to
partly process the raw data at the sensor on the edge of the network instead of
transmitting all data to the cloud. However, sensor nodes are typically highly
power-constrained due to the limited battery and also requires a long lifetime due
to difficulties in replacing nodes in many applications. Hence, this thesis focuses
on using different circuit and algorithmic techniques in particular approximate
computing, near and in-memory computing (IMC), dynamic voltage and frequency
scaling (DVFS) to reduce the energy consumption of edge devices in the Internet
of Things.
As a first example, we choose predictive maintenance (PdM), one of the most important
applications pertaining to IoT in Industry 4.0. Machine learning is used
to predict the failure of a machine before the actual event occurs. However, the
main challenges in PdM are (a) lack of enough data from failing machines to train
binary classifi ers, and (b) paucity of power and bandwidth to transmit sensor data
to cloud throughout the lifetime of the machine. In our work, we propose an
anomaly detection scheme that can be trained only using healthy machine data.
Our Anomaly Detection based Power Saving (ADEPOS) scheme is aimed at saving
energy by using approximate computing through the lifetime of the machine. At
the beginning of the machine's life, low accuracy computations are used when the
probability of the machine being healthy is high. However, on the detection of
anomalies, as time progresses, the anomaly detector is switched to higher accuracy
modes. Reduction in computation accuracy may be achieved in many ways,
such as reducing the number of neurons, reducing the bit width of data, dynamic
voltage frequency scaling, etc. Tested on the NASA bearing dataset, ADEPOS
demonstrates up to 8.8x reduction of neurons on average over the lifetime of bearings.
This resulted in 8.95x energy saving for microprocessor implementation and
~18.8x energy saving in an ASIC implementation, both in 65nm CMOS.
The second part of this research explores the near and in-memory computing (IMC)
to reduce the data movement between the storage and processing elements for video
processing in the application of traffic surveillance. Generally, image frames from a
camera undergo image denoising, region proposal, object classi cation, and object
tracking steps for traffic surveillance and monitoring. However, a realization of this
data-intensive computing following traditional von Neumann architecture involves
a higher energy dissipation and more substantial execution time due to the enormous
data movement between computing and storage units. Further, for stationary
cameras, there exists signi cant temporal redundancy which can be exploited by
event-driven or neuromorphic vision sensors (NVS) that report data only when
there is activity in the scene. However, due to the presence of noise, NVS pixels report
events even in the absence of actual activity. In this dissertation, a 6T-SRAM
in-memory computing based image denoising for event-based binary image (EBBI)
frame from a neuromorphic vision sensor (NVS) is presented. We suggest a nonoverlap
median lter (NOMF), an approximation of a traditional median lter for
image denoising. The NOMF enables us to implement image denoising leveraging
the inherent read disturb phenomenon of the 6T-SRAM. Besides, detecting zero
frames is easily done by IMC techniques tracking bit line voltage during ltering
operation and this can be used now to shut off the rest of the processor for ~2x
energy bene ts in urban traffic settings. Fabricated in 65nm CMOS, this chip
produces denoised frames with an energy efficiency of 51.3 TOPS/W and a peak
throughput of 134.4 GOPS at 70MHz.
As a next step, we propose a 9T-SRAM near and in-memory computing based
region proposal network for the event-based binary image frame to exploit spatial
redundancy in the valid frames. The region proposal network nds out the bounding
box encapsulating of an object which reduces the computation of an object
recognition deep neural network (DNN) by con ning the computing region surrounding
the object instead of the whole image frame. The proposed 9T-SRAM
cell enables a 1-D projection of objects on the horizontal and vertical axes of an
image. An iterative and selective search of the rising and falling edges of 1-D
projection yields the coordinates of a bounding box encapsulating an object. Simulated
in 65nm CMOS, this chip produces up to 16 region proposals per frame
and achieves ~682x energy savings compared to the digitally implemented connected
component labeling (CCL) algorithm and throughput of 1.17 frames/usec
at 200MHz.
In summary, we presented a set of algorithms and hardware solutions for energy
efficient edge computing that use approximate and in-memory compute techniques.
We have demonstrated the results in two different applications of predictive maintenance
and traffic monitoring.
|