Towards High-Performance Prediction Serving Systems

Yunseong Lee; Alberto Scolari; Byung-Gon Chun; Matteo Interlandi; Markus Weimer

Towards High-Performance Prediction Serving Systems

Yunseong Lee ,
Alberto Scolari ,
Byung-Gon Chun ,
Matteo Interlandi ,
Markus Weimer

31st Conference on Neural Information Processing Systems | December 2017

Organized by NIPS

Download BibTex

Machine Learning models are often composed of sequences of transformations. While this design makes it easy to decompose and efficiently execute single model components at training time, predictions require low latency and high-performance predictability whereby end-to-end and multi-model runtime optimizations are needed to meet such goals. This paper sheds some light on the problem by introducing a new system design for high-performance prediction serving. We report some preliminary results showing how our system design is able to improve performance over several dimensions with respect to current state-of-the-art approaches.