Robust Regression via Hard Thresholding

  • Kush Bhatia ,
  • Prateek Jain ,
  • Purushottam Kar

Proceedings of the 28th Annual Conference on Advances in Neural Information Processing Systems (NIPS) |

Publication

We study the problem of Robust Least Squares Regression (RLSR) where several response variables can be adversarially corrupted. More specifically, for a data matrix $X\in \R^{p\times n}$ and an underlying model $\bto$, the response vector is generated as $\y=X^T\bto+\b$ where $\b\in \R^n$ is the corruption vector supported over at most $C\cdot n$ coordinates. Existing exact recovery results for RLSR focus solely on $L_1$-penalty based convex formulations and impose relatively strict model assumptions such as requiring the corruptions $\b$ to be selected independently of $X$. In this work, we study a simple hard-thresholding algorithm called \alg which, under mild conditions on $X$, can recover $\bto$ exactly even if $\b$ corrupts the response variables in an \emph{adversarial} manner, i.e. both the support and entries of $\b$ are selected adversarially after observing $X$ and $\bto$. Our results hold under \emph{deterministic} assumptions which are satisfied if $X$ is sampled from any sub-Gaussian distribution. Finally unlike existing results that apply only to a fixed $\bto$, generated independently of $X$, our results are \emph{universal} and hold for any $\bto\in \R^p$. Next, we propose gradient descent-based extensions of \alg that can scale efficiently to large scale problems, such as high dimensional sparse recovery, and prove similar recovery guarantees for these extensions. Empirically we find \alg, and more so its extensions, offering significantly faster recovery than the state-of-the-art $L_1$ solvers. Empirically we find \alg, and more so its extensions, offering significantly faster recovery than the state-of-the-art $L_1$ solvers. For instance, even on moderate-sized datasets (with $p=50K$) with around $40\%$ corrupted responses, a variant of our proposed method called \alg-HYB is more than $20\times$ faster than the best $L_1$ solver.