The Price of Privacy and the Limits of LP Decoding

  • Cynthia Dwork ,
  • Frank McSherry ,
  • Kunal Talwar

Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC) |

Published by Association for Computing Machinery, Inc.

Publication

This work is at the intersection of two lines of research. One line, initiated by Dinur and Nissim, investigates the price, in accuracy, of protecting privacy in a statistical database. The second, growing from an extensive literature on compressed sensing (see in particular the work of Donoho and collaborators [14, 7, 13, 11]) and explicitly connected to error-correcting codes by Candès and Tao ([4]; see also [5, 3]), is in the use of linear programming for error correction.

Our principal result is the discovery of a sharp threshhold ρ∗ ≈ 0.239, so that if ρ < ρ∗ and A is a random m×n encoding matrix of independently chosen standard Gaussians, where m = O(n), then with overwhelming probability over choice of A, for all x ∈Rn, LP decoding corrects bρmc arbitrary errors in the encoding Ax, while decoding can be made to fail if the error rate exceeds ρ∗. Our bound resolves an open question of Candès, Rudelson, Tao, and Vershyin [3] and (oddly, but explicably) refutes empirical conclusions of Donoho [11] and Candès et al [3]. By scaling and rounding we can easily transform these results to obtain polynomialtime decodable random linear codes with polynomial-sized alphabets tolerating any ρ < ρ∗ ≈ 0.239 fraction of arbitrary errors.

In the context of privacy-preserving datamining our results say that any privacy mechanism, interactive or noninteractive, providing reasonably accurate answers to a 0.761 fraction of randomly generated weighted subset sum queries, and arbitrary answers on the remaining 0.239 fraction, is blatantly non-private.