NUMA-aware algorithms: the case of data shuffling

  • ,
  • Ippokratis Pandis ,
  • René Müller ,
  • Vijayshankar Raman ,
  • Guy M. Lohman

CIDR 2013, Sixth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 6-9, 2013, Online Proceedings |

In recent years, a new breed of non-uniform memory access (NUMA) systems has emerged: multi-socket servers of multicores. This paper makes the case that data management systems need to employ designs that take into consideration the characteristics of modern NUMA hardware. To prove our point, we focus on a primitive that is used as the building block of numerous data management operations: data shuffling. We perform a comparison of different data shuffling algorithms and show that a na¨ıve data shuffling algorithm can be up to 3× slower than the highest performing, NUMAaware one. To achieve the highest performance, we employ a combination of thread binding, NUMA-aware thread allocation, and relaxed global coordination among threads. The importance of such NUMA-aware algorithm designs will only increase, as future server systems are expected to feature ever larger numbers of sockets and increasingly complicated memory subsystems.