Metric Embeddings with Outliers
Host
We study the design of embeddings into Euclidean space with outliers. Given a metric space (X, d) and an integer k, the goal is to embed all but k points in X (called the “outliers”) into ℓ2 with the smallest possible distortion c. Finding the optimal distortion c for a given outlier set size k, or alternately the smallest k for a given target distortion c are both NP-hard problems. We consider bi-criteria approximations. Our main result is a polynomial time algorithm that approximates the outlier set size to within an O(log^2 k) factor and the distortion to within a constant factor.
We also show that the techniques we use in designing outlier embeddings into Euclidean space have the potential to extend to a wider variety of target embedding spaces. In particular, we consider probabilistic outlier embeddings, which involve probabilistic embeddings and only require that non-outlier pairs have low distortion in expectation over the randomness of the embedding. We show that for probabilistic outlier embeddings into hierarchically separated trees (HSTs), there is a polynomial time algorithm that takes in a finite metric space with a (k,c) probabilistic outlier embedding and produces a probabilistic outlier embedding with O(log^2 k) outliers and O(c) distortion.