NASerEx: Optimizing Early Exits via AutoML for Scalable Efficient Inference in Big Image Streams
- Aakash Kapoor ,
- Rajath Elias Soans ,
- Soham Dixit ,
- Pradeep NS ,
- Brijraj Singh ,
- Mayukh Das
Published by IEEE
Special session on Machine Learning on Big Data (MLBD)
We investigate the problem of smart operational efficiency, at scale, in Machine Learning models for Big Data streams, in context of embedded AI applications, by learning optimal early exits. Embedded AI applications that employ deep neural models depend on efficient model inference at scale, especially on resource-constrained hardware. Recent vision/text/audio models are computationally complex with huge parameter spaces and input samples typically pass through multiple layers, each with large tensor computations, to produce valid outputs. Generally, in most real scenarios, AI applications deal with big data streams, such as streams of audio signals, static images and/or high resolution video frames. Deep ML models powering such applications have to continuously perform inference on such big data streams for varied tasks such as noise suppression, face detection, gait estimation and so on. Ensuring efficiency is challenging, even with model compression techniques since they reduce model size but often fail to achieve scalable inference efficiency over continuous streams.
Early exits enable adaptive inference by extracting valid outputs from any pre-final layer of a deep model which significantly boosts efficiency at scale since many of the input instances need not be processed at all the layers of a deep model, especially for big streams. Suitable early exit structure design (number + positions) is a difficult but crucial aspect in improving efficiency without any loss in predictive performance, especially in context of big streams. Naive manual early exit design that does not consider the hardware capacity or data stream characteristics is counterproductive. We propose NASerEx framework that leverages Neural architecture Search (NAS) with a novel saliency-constrained search space and exit decision metric to learn suitable early exit structure to augment Deep Neural models for scalable efficient inference on big image streams. Optimized exit-augmented models perform ~2.5x faster having ~4x aggregated lower effective FLOPs, with no significant accuracy loss.