Piper: Multidimensional Planner for DNN Parallelization

NeurIPS 2021 |

Publication

The rapid increase in sizes of state-of-the-art DNN models and in the compute and memory requirements of training them has led to the development of many execution schemes such as data parallelism, pipelined model parallelism, tensor (intra-layer) model parallelism, and various memory-saving optimizations. However, no prior work has tackled the highly complex problem of finding the optimal partitioning of the DNN computation graph across many accelerators while combining the modes of parallelism and optimizations above. In this work we introduce Piper, an efficient optimization algorithm for this problem that is based on dynamic programming and a two-level approach. Our two-level approach is driven by the insight that being given tensor-parallelization techniques for individual layers (e.g. Megatron-LM)allows for a significant reduction of the search space, as opposed to considering arbitrary tensor-parallel configurations of the entire DNN operator graph, and makes the global problem tractable.