I'm looking for an efficient algorithm to merge a list of overlapping intervals (each of which has data associated) into non-overlapping intervals. In case two or more intervals overlap, the latter one wins (e.g. the later intervals shadow the earlier ones).
In my case, the intervals actually come pre-sorted (by starting point) but of course I can't do any further sorts because the intervals that come later in the input list dominate the previous ones.
Other facts:
- the intervals may overlap but don't have to
- there may be gaps not covered by any intervals
This feels like a common problem but I haven't found a common algorithm quite yet.
Algorithm ideas
The naive algorithm (O(n2)) is just to intersect each interval with every interval that follows, keeping only the 'non-occluded' pieces. But that's inefficient.
I've also thought about using a variation of the regular overlapping intervals algorithm: Basically, building a stack and always intersecting the next element with the top of the stack followed by pushing the non-overlapping part of the old element and the overlapping part of the new element. But of course that potentially leaves a left-over piece from the old element. (Essentially ((1, 10), "1") and ((2, 5), "2") becomes ((1, 2), "1"), ((2, 5), "2") with a left-over of ((5, 10), "1"). But that left over is now unsorted with respect to the elements that follow and I can't sort it into the input list either because other intervals need to dominate this one.
Or maybe I should use an interval intersection algorithm to find each of the segments and then use a scan line algorithm to find the original interval that dominates each of the segments?
Examples:
Example 1
- input:
[((1, 10), "green"), ((5, 8), "red")] - output:
[((1, 5), "green"), ((5, 8), "red"), ((8, 10), "green")]
Example 2:
- input:
[((1,10), "1"), ((2, 4), "2"), ((3, 8), "3"), ((4, 7), "4"), ((5, 6), "5"), ((6, 9), "6")] - output:
[((1, 2), "1"), ((2, 3), "2"), ((3, 4), "3"), ((4, 5), "4"), ((5, 6), "5"), ((6, 9), "6")] - visualised:
((1,10), "1") 1111111111
((2, 4), "2") |222 |
((3, 8), "3") ||333333 |
((4, 7), "4") |||4444 |
((5, 6), "5") ||||55 |
((6, 9), "6") |||||6666|
||||||||||
vvvvvvvvvv
========================
merged 1234566661
```
