14

I have two sets $S,T$ of points in the 2-dimensional plane. I want to find the closest pair of points $s,t$ such that $s \in S$, $t \in T$, and the Euclidean distance between $s,t$ is as small as possible. How efficiently can this be done? Can it be done in $O(n \log n)$ time, where $n = |S|+|T|$?

I know that if I'm given a single set $S$, then it's possible to find the closest pair of points $s,s' \in S$ in $O(n \log n)$ time using a standard divide-and-conquer algorithm. However, that algorithm doesn't seem to generalize to the case of two sets, because there's no connection between the distance between the two closest points within $S$ or $T$ vs. the distance between the two closest points across those sets.

I thought of storing the set $T$ in a $k$-d tree, then for each $s \in S$, using a nearest-neighbor query to find the closest point in $T$ to $s$. However, the worst-case running time of this might be as bad as $O(n^2)$ time. There are results saying that if the points of $T$ are randomly distributed, then the expected running time for each query is $O(\log n)$, so we'd obtain an algorithm with expected running time $O(n \log n)$ if we were guaranteed that the points are randomly distributed -- but I'm looking for an algorithm that will work for any collection of points (not necessarily randomly distributed).

Motivation: An efficient algorithm would be useful for this other question.

D.W.
  • 167,959
  • 22
  • 232
  • 500

3 Answers3

13

Yes, this can be $O(n \log n)$ time. Build a Voronoi diagram for $T$. Then, for each point $s \in S$, find which cell of the Voronoi diagram it is contained in. The center of that cell is the point $t \in T$ that is closest to $s$.

You can build a Voronoi diagram in $O(n \log n)$ time, and each query (find the cell containing $s$) can be done in $O(\log n)$ time, so the total running time is $O(n \log n)$ time.

D.W.
  • 167,959
  • 22
  • 232
  • 500
5

I am expanding my comment into an answer, since I figured out a semi-satisfactory answer. This only solves the problem for $L^1$-distance. This answer is basically wrong.

This paper solves the problem of finding the closest pair of points in $d$ dimensions for the case when the sets are separated by a hyperplane in $O(n \log^{d-1} n)$.

For two dimensions, this solves the case in the answer you reference as your primary motivation for your question in $O(n \log n)$. It can also be used to solve the general case of the 2D problem in $O(n \log^2 n)$.

Given two sets $S, T$ of points in 2D, embed them in 3D space, displacing set $S$ by some $-\delta_z$ and set $T$ by $\delta_z$ in the $z$ direction. The choice of $\delta_z$ can be made to not affect the choice of the closest pair of points by taking $\delta_z$ to be smaller than the precision of your input points (and doubling the precision bits for each input coordinate). Use the 3D algorithm from the cited paper.

aelguindy
  • 1,827
  • 14
  • 18
0

This can be done very easily with KDTrees. For example:

from sklearn.neighbors import KDTree
import numpy as np

n = 1000 a = np.random.random((n,3)) b = np.random.random((n,3))

tree = KDTree(a) (distances, neighbors) = tree.query(b, k=1)

min_dist = distances.min()

The KDTree object enables assignments of nearest neighbors with a binary search. The tree.query function also has a dualtree=True argument that will use a tree for b as well that could be useful for larger trees. But for values of $n \le 10000$ I found dualtree=False (default) to be faster.

AlexD
  • 9
  • 2