I apologize if this is the wrong place or too trivial a question for this community. What is the best data structure to store a time-windowed streaming graph in order to compute fast statistics over all nodes in the graph, for e.g., running computation of average degree?
I believe the best way to describe this is as follows: Let $G=(V,E)$ be a sparse time-evolving network modeled as an undirected graph with $n$ nodes and $m \geq n$ edges over time (in hours) $t = t_0, t_1, \dots $. Suppose further, that at any time point $t_i$, any edges that are more than $k$ hours are removed. In addition, nodes that have no edges connecting to it are removed.
My idea (for e.g. the average degree) is as follows: keep track of an array of edge arrival times as well as a degree array of size $n$ where each element represents the total degree. At any new time point $t_i$, we would add 1 to the degree array corresponding to the two nodes with the new edge. We would then remove all edges that are older than $k$ hours (i.e. added before $t_i-k$). Any nodes that are edge-less are removed. At all time points, a running average of the degree is computed by taking the average of the degree array.
If I'm not mistaken, this algorithm would be $O(n)$ in run-time and $O(n)$ in space. Is there any better way of doing this?
The best data structure I could find from previous posts such as this one, recommend adjancency lists. Additionally, is there any advantage in using disjoint set data structures such as in the post here?