Treebeard: An Optimizing Compiler for Decision Tree Based ML Inference

IEEE/ACM International Symposium on Microarchitecture (MICRO) |

Organized by IEEE

Decision tree ensembles are among the most commonly used machine learning models. These models are used in a wide range of applications and are deployed at scale. Decision tree ensemble inference is usually performed with libraries such as XGBoost, LightGBM, and Sklearn. These libraries incorporate a fixed set of optimizations for the hardware targets they support. However, maintaining these optimizations is prohibitively expensive with the evolution of hardware. Further, they do not specialize the inference code to the model being used, leaving significant performance on the table. This paper presents TREEBEARD, an optimizing compiler that progressively lowers the inference computation to optimized CPU code through multiple intermediate abstractions. By applying model-specific optimizations at the higher levels, tree walk optimizations at the middle level, and machine-specific optimizations lower down, TREEBEARD can specialize inference code for each model on each supported CPU target. TREEBEARD combines several novel optimizations at various abstraction levels to mitigate architectural bottlenecks and enable SIMD vectorization of tree walks. We implement TREEBEARD using the MLIR compiler infrastructure and demonstrate its utility by evaluating it on a diverse set of benchmarks. TREEBEARD is significantly faster than state-of-the-art systems, XGBoost, Treelite and Hummingbird, by 2.6×, 4.7× and 5.4× respectively in a single-core execution setting, and by 2.3×, 2.7× and 14× respectively in multi-core settings.