2

I haven't been able to find literature on the efficient solving of the following problem.

Given $n$ random points $x_i \in (0, 1)^2$ the unit square, obtain the flat clusters of points, such that two points are in the same cluster if their pairwise distance $d(x_i, x_j) < r$, where $r < 1$ is a fixed parameter of the problem.

The naive solution checks each pairwise distance, therefore completing in $\mathcal{O}(n^2)$ time. Is there a faster way, assuming $n$ large and $r \ll 1$?

David Richerby
  • 82,470
  • 26
  • 145
  • 239

1 Answers1

2

The following algorithm does unfortunately does not allow you to get the $n^2$ factor out (see comment below).

Superimpose an $r/\sqrt 2$-by-$r/\sqrt 2$ grid on the unit square: notice that every point within the same grid cell is automatically within the same cluster. Also notice that there can be at most $n$ grid cells containing one or more points; these are the only grid cells we need to consider from this point on.

  1. Map each point to its grid cell.
  2. For each grid cell, throw all contained points into the same cluster.
  3. For each pair of neighbouring grid cells $P$, $Q$ (each grid cell, apart from those at edges and corners, has 8 such neighbours), determine the closest pair of points $p, q$ having $p \in P$ and $q \in Q$ using this $O(n\log n)$ algorithm. If $d(p, q) < r$ then combine $C(P)$ and $C(Q)$, otherwise not.

You can use a union/find disjoint set data structure to maintain the clusters in close-to-linear time, just as per the Kruskal algorithm.

j_random_hacker
  • 5,509
  • 1
  • 17
  • 22