The Johnson-Lindenstrauss Lemma

The so-called “curse of dimensionality” reflects the idea that many methods are more difficult in higher-dimensions. This difficulty may be due a number of issues that become more complicated in higher-dimensions: e.g., NP-hardness, sample complexity, or algorithmic efficiency. Thus, for such problems, the data are often first transformed using some dimensionality reduction technique before applying the method of choice. In this post, we discuss a result in high-dimensional geometry regarding how much one can reduce the dimension while still preserving distances.

Johnson-Lindenstrauss Lemma

Given points \(z_1,z_2,\ldots,z_N \in \mathbb{R}^d\), we want to find points \(u_1,\ldots,u_N \in \mathbb{R}^k\), where , such that the distance between points is approximately preserved, i.e., for all , \[ ||z_i - z_j||_2 \leq ||u_i - u_j||_2 \leq (1+\epsilon) ||z_i-z_j||_2, \] where is the norm. Thus, we’d like to find some mapping , where , that maps the data to a much lower dimension while satisfying the above inequalities.

The Johnson-Lindenstrauss Lemma (JL lemma) tells us that we need dimension , and that the mapping is a (random) linear mapping. The proof of this lemma is essentially given by constructing via a random projection, and then showing that for all , the random variable concentrates around its expectation.

This argument is an example of the probabilistic method, which is a nonconstructive method of proving existence of entities with particular properties: if the probability of getting an entity with the desired property is positive, then this entity must be an element of the sample space and therefore exists.

Proof of the JL lemma

We can randomly choose vectors , where each , by choosing each coordinate of the vector randomly from the set \[ \left\{\left(\frac{1+\epsilon}{k}\right)^{\frac{1}{2}},-\left(\frac{1+\epsilon}{k}\right)^{\frac{1}{2}}\right\}. \]

Now consider the mapping from defined by the inner products of with the random vectors: \[z \mapsto (z^\top x_1, \ldots, z^\top x_k)\] So, each vector is obtained via a random projection. (Alternatively, we can think of the mapping as a random linear transformation given by a random matrix , where the vectors form the rows of the matrix, and .) The goal is to show that there exists some that satisfies the above inequalities.

Fixing , define and , and . Thus, we can write the squared norm of as where refers to the th component of the th vector.

Now we consider the random variable in the sum. The expectation is \[ \mathbb{E}(Y_n) = \frac{1+\epsilon}{k} ||z||_2^2. \] So, the expectation of \(||u||_2^2\) is

It remains to show that concentrates around its mean , which we can do using a Chernoff bound. In particular, consider the two cases of and . Via a Chernoff bound, the probability of at least one of these two “bad” events occurring is upper bounded by \[ \Pr[\{ ||u||^2 > (1+\delta) \mu)\} \lor \{||u||^2 > (1-\delta) \mu \}] < \exp(-c \delta^2 k), \] for some .

Recall that , and so there are such random variables. Now choosing , the probability that any of these random variables is outside of of their expected value is bounded by

which follows from a union bound.

Choosing ensures that with all variables are within of their expectations, i.e., . Thus, rewriting this, we have that for all ,

which then implies the desired result:

References

  1. Arora and Kothari. High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction.
  2. Dasgupta and Gupta. An Elementary Proof of Theorem of Johnson and Lindenstrauss.
  3. Nelson. Dimensionality Reduction Notes Part 1 Part 2
Tweet