Improved Diversity Maximization Algorithms for Matching and Pseudoforest

APPROX |

In this work we consider the diversity maximization problem, where given a data set X of n elements, and a parameter k, the goal is to pick a subset of X of size k maximizing a certain diversity measure. [CH01] defined a variety of diversity measures based on pairwise distances between the points. A constant factor approximation algorithm was known for all those diversity measures except “remote-matching”, where only an O(log k) approximation was known. In this work we present an O(1) approximation for this remaining notion. Further, we consider these notions from the perspective of composable coresets. [IMMM14] provided composable coresets with a constant factor approximation for all but “remote-pseudoforest” and “remote-matching”, which again they only obtained a O(log k) approximation. Here we also close the gap up to constants and present a constant factor composable coreset algorithm for these two notions. For remote-matching, our coreset has size only O(k), and for remote-pseudoforest, our coreset has size O(k^{1+ε}) for any ε>0, for an O(1/ε)-approximate coreset.