Co-optimize DNN Arithmetics and Hardware System for Efficient Inference and Training

Speaker: Sai Qian Zhang

Location: 370 Jay Street, Room 825
Videoconference link: https://nyu.zoom.us/j/97294543933

Date: Monday, March 20, 2023

In recent years, we have seen a proliferation of sophisticated Deep Neural Network (DNN) architectures that have achieved state-of-the-art performances across a variety of domains. However, the algorithmic superiority of DNNs levies high latency and energy taxes at all computing scales, which further poses significant challenges to the hardware platforms executing them. Given the fact that the DNN architectures and the hardware platform executing them are tightly coupled and tangled, my research lies in building a full-stack solution to co-optimize DNN across the architecture, datatype and supporting hardware system to achieve efficient inferences and training operations.

In this talk, I will first describe Column-Combining, an innovative pruning strategy that packs sparse filter matrices into a denser format for efficient deployment in a novel systolic architecture with nearly perfect utilization rate. Following that, I will then describe a bit-level quantization method named Term Quantization (TQ). Unlike the conventional quantization methods that operate on individual values, Term Quantization is a group- based method that keeps a fixed number of largest terms (nonzero bits in the binary representations) within a group of values, and this in turn leads to a significantly smaller amount of quantization error compared to other quantization approaches under the same bitwidth. Next, I will introduce the work I have done to facilitate the DNN training process. In particular, I will describe the Fast First, Accurate Second Training (FAST) system that adaptively adjusts the precision of the DNN operands for efficient DNN training. Last but not least, I will conclude with some of my recent research efforts and future research plans on further extending the frontiers of the DNN training hardware efficiency by leveraging the underlying reversibility of the DNN architecture.