Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm, and has quite a few effective implementations. Although many engineering optimizations have been adopted in these implementations, the efficiency and scalability are still unsatisfactory when the feature dimension is high and data size is large. A major reason is that for each feature, they need to scan all the data instances to estimate the information gain of all possible split points, which is very time consuming. LightGBM is a GBDT open-source tool enabling highly efficient training over large scale datasets with low memory cost. LightGBM adopts two novel techniques Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). With GOSS, LightGBM can train each tree with only a small fraction of the full dataset. With EFB, LightGBM handles high-dimensional sparse features much more efficiently. LightGBM also support distributed training with low communication cost and fast training on GPUs.
People
Tie-Yan Liu
Distinguished Scientist, Microsoft Research AI for Science
Yu Shi
Researcher