Algorithm for clustering points in $\mathbb{R}^2$ based on pairwise distance (percolation graphs)

Question

I haven't been able to find literature on the efficient solving of the following problem.

Given $n$ random points $x_i \in (0, 1)^2$ the unit square, obtain the flat clusters of points, such that two points are in the same cluster if their pairwise distance $d(x_i, x_j) < r$, where $r < 1$ is a fixed parameter of the problem.

The naive solution checks each pairwise distance, therefore completing in $\mathcal{O}(n^2)$ time. Is there a faster way, assuming $n$ large and $r \ll 1$?

j_random_hacker · Accepted Answer · 2016-11-04T17:12:28.127

The following algorithm ~~does~~ unfortunately does not allow you to get the $n^2$ factor out (see comment below).

Superimpose an $r/\sqrt 2$-by-$r/\sqrt 2$ grid on the unit square: notice that every point within the same grid cell is automatically within the same cluster. Also notice that there can be at most $n$ grid cells containing one or more points; these are the only grid cells we need to consider from this point on.

Map each point to its grid cell.
For each grid cell, throw all contained points into the same cluster.
For each pair of neighbouring grid cells $P$, $Q$ (each grid cell, apart from those at edges and corners, has 8 such neighbours), determine the closest pair of points $p, q$ having $p \in P$ and $q \in Q$ using this $O(n\log n)$ algorithm. If $d(p, q) < r$ then combine $C(P)$ and $C(Q)$, otherwise not.

You can use a union/find disjoint set data structure to maintain the clusters in close-to-linear time, just as per the Kruskal algorithm.

Algorithm for clustering points in $\mathbb{R}^2$ based on pairwise distance (percolation graphs)

1 Answers1

Linked