A Reinforcement-Learning-based Energy-Efficient Framework for Multi-Task Video Analytics Pipeline

Yingying Zhao; Mingzhi Dong; Yujiang Wang; Da Feng; Qin Lv; Robert P. Dick; Dongsheng Li; Tun Lu; Ning Gu; Li Shang

A Reinforcement-Learning-based Energy-Efficient Framework for Multi-Task Video Analytics Pipeline

Yingying Zhao ,
Mingzhi Dong ,
Yujiang Wang ,
Da Feng ,
Qin Lv ,
Robert P. Dick ,
Dongsheng Li ,
Tun Lu ,
Ning Gu ,
Li Shang

IEEE Transactions on Multimedia | April 2021

Download BibTex

Deep-learning-based video processing has yielded transformative results in recent years. However, the video analytics pipeline is energy intensive due to high data rates and reliance on complex inference algorithms, which limits its adoption in energy-constrained applications. Motivated by the observation of high and variable spatial redundancy and temporal dynamics in video data streams, we design and evaluate an adaptive-resolution optimization framework to minimize the energy use of multi-task video analytics pipelines. Instead of heuristically tuning the input data resolution of individual tasks, our framework utilizes deep reinforcement learning t dynamically govern the input resolution and computation of the complete video analytics pipeline. By monitoring the impact of variable resolution on the quality of high-dimensional video analytics features, hence the accuracy of video analytics results, the proposed end-to-end optimization framework learns the best non-myopic policy for dynamically controlling the resolution of input video streams to achieve globally optimized energy efficiency. Governed by reinforcement learning, optical flow is incorporated into the framework to minimize unnecessary temporal-redundancy induced spatial feature re-computation as well as preserving the performance. The proposed framework is applied to video instance segmentation, one of the most challenging machine vision tasks, and achieves the best energy efficiency on the YouTube-VIS dataset among all baseline methods.