1

Are there any existing algorithms/strategies to test if an integer value is within the bounds of any of a list of intervals, where each interval is denoted by a [min, max] pair?

The naive approach is very straightforward, but has O(intervals) test performance:

list intervals (min, max)

func testVal(val){
  foreach (interval(min, max) in intervals)
    if (val > max) || (val < min) return false      
  return true
}

In the problem I'm facing the list is not updated frequently, but it is tested against with very high frequency.

Thus, it's acceptable for updating the list to be expensive, but tests need to be sublinear, ideally constant time.

One option is to insert every value in the intervals into a hashtable, but this is memory-intensive and not practical when the intervals get large. Memory usage needs to be proportional to the number of intervals, not the sum of the sizes of the intervals (probably ok to have a log(n) memory usage term thrown in if it greatly aids performance).

I've started poking around with various tree-based options to make this better, but it feels like a general enough problem that I suspect there is preexisting work I should be referencing. Are there any existing solutions to get sublinear tests (ideally constant time) on a list of intervals by preprocessing them into some other datastructure(s)?

Techrocket9
  • 163
  • 9

2 Answers2

1

Here's my thought process on this problem:

  • If you only care about "in or out", you can reduce the list of intervals so that no intervals overlap, and the intervals are in sorted order.
    • Sorting by leftmost point takes $O(n \log n)$, then replacing any overlapping pairs with their union takes $O(n)$. But you've said you don't care how long this part takes.
  • For a given point and a given interval, you can determine whether the point is "outside to the left", "inside", or "outside to the right" in $O(1)$.
    • This assumption only works as long as you have a fixed word length, but that's generally true for all practical purposes.
    • Let's change the terms: "outside to the left" is "less than", "inside" is "equal to", and "outside to the right" is "greater than".
  • Now you can use standard binary search!
    • Since each comparison is $O(1)$, the total time is $O(\log n)$.

And voilĂ , each check takes only logarithmic time! Setting up the list in the first place takes $O(n \log n)$, and adding a new interval to the list later takes $O(n)$ in the worst case.

Draconis
  • 7,216
  • 1
  • 19
  • 28
1

You need an interval tree, which queries in O (logn + m) time, where $m$ is the number of intervals the point hits (as opposed to O (n) for n intervals in the data structure).

This should produce enough savings that you'll hardly notice the cost of $m$.

This output sensitive algorithm is likely the best you can do. The other answer suggests this structure without name, with the additional step of making all intervals disjoint to get log$n$ ($m$ guaranteed to be 1) That's fine, but I'd still use an established data structure.