Thesis Defense: Architecture Design for Highly Flexible and Energy-Efficient Deep Neural Network Accelerators

Speaker

Yu-Hsin Chen
CSAIL

Host

Prof. Vivienne Sze (Thesis Supervisor)
CSAIL
Abstract:

Deep neural networks (DNNs) are the backbone of modern artificial
intelligence (AI). While they deliver state-of-the-art accuracy in
numerous AI tasks, deploying DNNs into the field is still very
challenging due to their high computational complexity and diverse
shapes and sizes. Therefore, DNN accelerators that can achieve high
performance and energy efficiency across a wide range of DNNs are
critical for enabling AI in real-world applications.

In this thesis, we present Eyeriss, a hardware architecture for DNN
processing that is optimized for performance, energy efficiency and
flexibility. Eyeriss minimizes data movement, which is the bottleneck
of both performance and energy efficiency for DNNs, with a novel
dataflow, named row-stationary (RS). The RS dataflow supports
highly-parallel processing while fully exploiting data reuse in a
multi-level memory hierarchy to optimize for the overall system energy
efficiency given any DNN shape and size. It has demonstrated 1.4× to
2.5× higher energy efficiency than other existing dataflows.

We present two versions of the Eyeriss architecture that support the
RS dataflow. Eyeriss v1 targets large DNNs that have plenty of data
reuse opportunities. It features a flexible mapping strategy to
increase the utilization of processing elements (PEs) for high
performance, a multicast on-chip network (NoC) to exploit data reuse,
and further exploits data sparsity to save 45% PE power and reduce the
off-chip bandwidth by 1.2×–1.9×. Fabricated in a 65nm CMOS, Eyeriss v1
consumes 278 mW at 34.7 fps for CONV layers of AlexNet, which was 10×
more efficient than a mobile GPU.

Eyeriss v2 addresses the recent trend of making DNNs more compact in
terms of size and computation, which also introduces higher variation
in the amount of data reuse and sparsity. It has two key features: (1)
a flexible and scalable NoC that can provide high bandwidth when data
reuse is low while still being able to exploit data reuse when
available; (2) an improved dataflow, named RS Plus, that increases the
utilization of PEs. Together, they provide over 10× higher throughput
than Eyeriss v1. Eyeriss v2 also further exploits sparsity for up to
an additional 4.6× increase in throughput.

Thesis Committee:

Prof. Vivienne Sze (Thesis Supervisor)
Prof. Joel Emer (Thesis Supervisor)
Prof. Daniel Sanchez