2

Determining whether or not an array has duplicate entries has two straightforward solutions:

  • Build a hashset of entries, then search for elements in this hashset. This takes $\mathcal O(n)$ time and $\mathcal O(n)$ extra space.
  • Sort the array, then search for consecutive elements that are the same. This takes $\mathcal O(n \log n)$ time and $\mathcal O(1)$ extra space.

Is there an algorithm that can solve this problem with the best of both approaches, using only $\mathcal O(n)$ time and $\mathcal O(1)$ extra space?

Question Finding duplicate in immutable array in linear time and constant space is similar, but the solution to that question only works when the values come from the set $\{1, ..., n\}$; my values are large integers. Also unlike that question, I allow the input array to be modified and used as a workspace.

One strategy might be to attempt to turn the array into a kind of hashtable. However, this seems like it won't work because of the lack of empty space (which makes both moving objects into place hard, as well as getting $\mathcal O(1)$ queries.).

However, I suspect that this cannot actually be done, but I'm not sure how to go about proving it.

Curtis F
  • 1,063
  • 9
  • 17

1 Answers1

2

Suppose that your elements come from a domain of size $n^2$. Any algorithm using time $T$ and space $S$ corresponds to a branching program having depth $T$ and containing at most $T \cdot 2^S$ nodes. Theorem 6.13 of Time-space tradeoff lower bounds for randomized computation of decision problems shows that $$ T = \Omega\left(n \sqrt{\log \tfrac{n}{S + \log T}/\log\log \tfrac{n}{S + \log T}}\right). $$ In particular, if $S$ is $O(\log n)$ (which corresponds to your $O(1)$ extra space, assuming you cannot modify the input) then $T = \Omega(n\sqrt{\log n/\log\log n})$.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514