Since all intervals are non-overlapping, the use of an interval tree is unnecessary. We will store our intervals in an AVL tree $T$ sorted by start points and use the fact that bulk deletion of a set of contiguous keys $q_i,...q_{i+k-1}$ can be done in $O(\log n + \log k)$ amortized time (see non-open access and open access).
Let $x=[a,b)$ be an interval and $begin(x)=a$, $end(x)=b$. We define $pred(x)$ and $succ(x)$ to be the previous and next intervals. Both functions have $O(\log n)$ complexity, where $n$ is the number of intervals in the tree. The number of disjoint intervals is the number of leaves in $T$.
Insertion
Let us first examine what happens when we insert an interval $x$ into the tree. When inserting $x$, we can determine the range of intervals it overlaps with in $O(\log n)$ time by performing predecessor and successor queries. When inserting $x$, we need to perform one of the following three operations on $T$
- Add a new disjoint interval.
- Extend an existing interval.
- Combine and potentially extend $k$ existing intervals.
(1a) The first case happens if the inserted query $q$ doesn't overlap with any of the intervals in our set i.e. $
end(pred(x)) \leq start(x) \land begin(succ(x)) \geq end(x)$ Since no intervals need to be updated, we simply insert $x$ into $T$.
(2a) The second case happens if $x$ overlaps with or contains one other interval $q$. In order to update the interval, delete $q$ and reinsert $q'$ where $$q'=[\min(start(x), start(q)), \max(end(x), end(q)))$$
(3a) In the third case, let $Q=\{q_i, q_{i+1}, ... q_{i+k-1}\}$ be all $k$ of the intervals that $x$ intersects. $Q$ can be obtained by finding the predecessor and successor of $x$. We can again update the tree by deleting all of $Q$ from $T$ and reinserting an interval $$q'=[\min(start(x), start(q_i)), \max(end(x), end(q_{i+k-1})))$$
The first and second cases have $O(\log n)$ time complexity. The third case at first looks like $O(k\log n)$ time which is undesirable as $k=O(n)$. However, it has been shown that bulk deletion of a set of keys in the range $[L, R]=\{q_i, q_{i+1}, ... ,q_{i+k-1}\}$ can be done in amortized $O(\log n + \log k)$ time. Therefore, for insert we have amortized $O(\log n)$ complexity. Note that the number of leaves (disjoint intervals) in the tree can be updated after each insertion in constant time.
Deletion
By deleting an interval $x$, we either
- Split an interval into two
- Shorten 1 or 2 intervals and delete those in between (if any).
(1b) The first case happens if $x$ is nested within an interval $q$ i.e.
$$start(q) \leq start(x) \land end(x) \leq end(q) $$ Therefore, we delete $q$ and insert the two resulting intervals $[start(q), start(x))$ and $[end(x), end(q))$
(2b) In the second case, let $Q=\{q_i, q_{i+1}, ... q_{i+k-1}\}$ be the intervals that $x$ overlaps with. We can delete all of $Q$ from the tree and update and reinsert the the trimmed versions of $q_i$ and $q_{i+k-1}$ if they were not completely nested in $x$. As in case (3a), this is a bulk deletion that can be done in amortized $O(\log n + \log k)$ time.
Similar to insertions, we can keep track of the number of leaves after each deletion in constant time.
Amortized Complexity of Bulk Updates in AVL-Trees
Bulk Updates and Cache Sensitivity in Search Trees