Resolving a dependency graph with insufficient resources to store all states

Question

A common way to resolve a dependency graph is to compute an execution order, and then execute each stage in turn - storing and fetching the resources as necessary. In this example, when executing stage E we know that stage B is present in the cache due to the execution order. At the end of execution, the output of every operation is present in the cache.

This works fine if your cache is big enough to store the output of every result in the graph, but this is not always true. In my case the number of operations may approach 10,000 or so (brush strokes in a painting for the curious), and the number of items that can be stored in the cache is perhaps 100 or so (textures in the GPU). Fortunately the graph has lots of chains, so should be solvable.

There are at (so far) three questions that come out of this situation:

1) How to compute an execution order?

Given that the cache is of finite size, not all execution orders are valid. Imagine that we only have two available resources and consider:

If we try execute B and C (both of which can be computed immediately), then we get stuck because we cannot compute A due to lack of resources, and we cannot compute D or E due to unfulfilled dependencies.

Operation:  B | C | A | ?
------------------------------>
In Cache:     | B |B,C| ??

Of course, if we execute A -> B -> D -> C -> E then we can complete execution of the graph. (The outputs of A and B can be removed after D is computed etc.).

Operation:  A | B | D | C | E
--------------------------------->
In Cache:     | A |A.B| D |C,D| E
                        ^
                        |
                 Removed A and B to make room for D

This implies that an ad-hoc iterative implementation (for each node check if all it's dependencies are ready, and if so, run it) will not function in cases where resources are low.

So, how can I derive an execution order that is optimized for the lowest number of resources to be allocated at any given time?

2) How to count how many resources are required for graph execution?

Evidently there is some limit to if a graph is computable with a provided amount of resource. For example the first graph requires three resources, and the second one requires at least two. In those cases that is because one of the nodes has that many inputs. But this isn't always the case. Consider:

In this case every node has only two inputs, but execution of the graph requires three resources.

I guess this is highly related to actually solving the graph, but it may be separable. Either way it is nice to say "this long an expensive system graph that will take 6 hours to run will successfully complete and not run out of resources half way through"

3) How to tell when a resource can be removed from the cache

I guess one approach is to have each item in the cache count how many times it has been used, and when that number is equal to the number of dependants, it can be removed. But it would be nice to compute two associated arrays: one that describes the order of operations to execute, and another that describes, for each operation, which items in the cache are no longer required.

4) Are there any existing resources/studies on solving this problem?

I've had a look through some books on graph theory and done some browsing of the web, but haven't found any resources that describe this problem. It is likely that I don't know the terminology involved, so suggesting some reading is welcome. In fact, I would love it if someone says "It's known as problem x and there are dozens of papers on the topic"

score 1 · Answer 1 · answered Nov 24 '21 at 22:09

If you consider the following decision problem:

Input: a dependancy graph $G$ and an integer $k$.
Question: can $G$ be computed using less than $k$ ressources?

Then if every node cannot be computed more than once, the problem is $\textsf{NP}$-complete (Sethi 1973). If a node can be computed several times, the problem is also shown to be $\textsf{NP}$-hard by Sethi, and even $\textsf{PSPACE}$-complete by Gilbert and Tarjan in 1978.

So there is no known efficient solution to this problem.

score 0 · Answer 2 · answered Nov 24 '21 at 19:45

I chatted to some people at work, one who suggested that you could lay out the graph in "lanes"

This gave a model for how to do analyse how I was mentally assigning execution orders.

@greybeard mentioned the similarities to scheduling and register allocation, so I did a bunch of reading on those. After a while I realized that was exactly what I was doing - I was creating and ordering instructions for a computing architecture.

Another realisation came in distinguishing between load execute and delete operations. A resource must be loaded before any operation requires it. A resource can (must?) be deleted when no other operations require it. Sounds obvious, but it made me realise that each "operation" of the computer architecture consisted of those three parts: load any resource required for the operation, execute the operation then delete any resources no longer needed.

After scribbling on paper for an hour or two, here's the algorithm I came up with (Functioning code is available here):

Because this is a dependency graph, we're going to start at the output node and walk backwards until there is nothing left to do.

You will need to track:

The state of the systems memory
An array of "ProcessStages" that consist of an operation to execute, resources to load (allocate_before) and resource to delete (delete_after).
An array of operations that haven't yet been stored as operations in a ProcessStage.

Then there is some setup:

Find all nodes that nothing depends on, Mark it as the current operation and mark as "delete_after".

Followed by an iterative process:

Copy the memory state, ignoring are in allocate_before (we are working backwards, so we are saying that "the operation before the current one doesn't have the output from these in memory)
Pick an operation from the current memory state, Making sure that it is in the list of operations that haven't yet been assigned to a ProcessStage.
Ensure all of the operations dependencies are in memory, marking them as delete after if they were not copied from the previous memory state. (We are working backwards, so we are saying that "after being used for this operation it can be deleted")
Mark the current operation as "allocate before"

And Then

Reverse the order of operations you just derived.

Does it work? Yup. I implemented an "executor" to try execute these instructions (with heaps of asserts to ensure validity) and a "compute_execution_order" function.

It took the graph:

And converted it into the execution instructions:

ProcessStage(operation=Operation(name='E', depends_on=[]), allocate_before=[('E', 0)], delete_after=[])
ProcessStage(operation=Operation(name='D', depends_on=['E']), allocate_before=[('D', 1)], delete_after=['E'])
ProcessStage(operation=Operation(name='L', depends_on=[]), allocate_before=[('L', 2)], delete_after=[])
ProcessStage(operation=Operation(name='K', depends_on=['L']), allocate_before=[('K', 0)], delete_after=['L'])
ProcessStage(operation=Operation(name='J', depends_on=['K', 'D']), allocate_before=[('J', 3)], delete_after=['K'])
ProcessStage(operation=Operation(name='Q', depends_on=[]), allocate_before=[('Q', 0)], delete_after=[])
ProcessStage(operation=Operation(name='P', depends_on=['Q']), allocate_before=[('P', 4)], delete_after=['Q'])
ProcessStage(operation=Operation(name='O', depends_on=['P']), allocate_before=[('O', 0)], delete_after=[])
ProcessStage(operation=Operation(name='N', depends_on=['O']), allocate_before=[('N', 2)], delete_after=['O'])
ProcessStage(operation=Operation(name='M', depends_on=['N', 'P']), allocate_before=[('M', 0)], delete_after=['N', 'P'])
ProcessStage(operation=Operation(name='F', depends_on=[]), allocate_before=[('F', 4)], delete_after=[])
ProcessStage(operation=Operation(name='G', depends_on=['F', 'M']), allocate_before=[('G', 2)], delete_after=['M'])
ProcessStage(operation=Operation(name='C', depends_on=['D', 'F']), allocate_before=[('C', 0)], delete_after=['D', 'F'])
ProcessStage(operation=Operation(name='I', depends_on=['J']), allocate_before=[('I', 1)], delete_after=['J'])
ProcessStage(operation=Operation(name='H', depends_on=['I']), allocate_before=[('H', 3)], delete_after=['I'])
ProcessStage(operation=Operation(name='B', depends_on=['C', 'H']), allocate_before=[('B', 1)], delete_after=['C', 'H'])
ProcessStage(operation=Operation(name='A', depends_on=['B', 'G']), allocate_before=[('A', 0)], delete_after=['B', 'G'])

Which create a memory graph looking like:

      |  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  
      | E ☐ |     |     |     |     |     |     |     |     |     
  (E) | E ☑ |     |     |     |     |     |     |     |     |     
      | E ☑ |     |     |     |     |     |     |     |     |     
      | E ☑ | D ☐ |     |     |     |     |     |     |     |     
  (D) | E ☑ | D ☑ |     |     |     |     |     |     |     |     
      |     | D ☑ |     |     |     |     |     |     |     |     
      |     | D ☑ | L ☐ |     |     |     |     |     |     |     
  (L) |     | D ☑ | L ☑ |     |     |     |     |     |     |     
      |     | D ☑ | L ☑ |     |     |     |     |     |     |     
      | K ☐ | D ☑ | L ☑ |     |     |     |     |     |     |     
  (K) | K ☑ | D ☑ | L ☑ |     |     |     |     |     |     |     
      | K ☑ | D ☑ |     |     |     |     |     |     |     |     
      | K ☑ | D ☑ |     | J ☐ |     |     |     |     |     |     
  (J) | K ☑ | D ☑ |     | J ☑ |     |     |     |     |     |     
      |     | D ☑ |     | J ☑ |     |     |     |     |     |     
      | Q ☐ | D ☑ |     | J ☑ |     |     |     |     |     |     
  (Q) | Q ☑ | D ☑ |     | J ☑ |     |     |     |     |     |     
      | Q ☑ | D ☑ |     | J ☑ |     |     |     |     |     |     
      | Q ☑ | D ☑ |     | J ☑ | P ☐ |     |     |     |     |     
  (P) | Q ☑ | D ☑ |     | J ☑ | P ☑ |     |     |     |     |     
      |     | D ☑ |     | J ☑ | P ☑ |     |     |     |     |     
      | O ☐ | D ☑ |     | J ☑ | P ☑ |     |     |     |     |     
  (O) | O ☑ | D ☑ |     | J ☑ | P ☑ |     |     |     |     |     
      | O ☑ | D ☑ |     | J ☑ | P ☑ |     |     |     |     |     
      | O ☑ | D ☑ | N ☐ | J ☑ | P ☑ |     |     |     |     |     
  (N) | O ☑ | D ☑ | N ☑ | J ☑ | P ☑ |     |     |     |     |     
      |     | D ☑ | N ☑ | J ☑ | P ☑ |     |     |     |     |     
      | M ☐ | D ☑ | N ☑ | J ☑ | P ☑ |     |     |     |     |     
  (M) | M ☑ | D ☑ | N ☑ | J ☑ | P ☑ |     |     |     |     |     
      | M ☑ | D ☑ |     | J ☑ |     |     |     |     |     |     
      | M ☑ | D ☑ |     | J ☑ | F ☐ |     |     |     |     |     
  (F) | M ☑ | D ☑ |     | J ☑ | F ☑ |     |     |     |     |     
      | M ☑ | D ☑ |     | J ☑ | F ☑ |     |     |     |     |     
      | M ☑ | D ☑ | G ☐ | J ☑ | F ☑ |     |     |     |     |     
  (G) | M ☑ | D ☑ | G ☑ | J ☑ | F ☑ |     |     |     |     |     
      |     | D ☑ | G ☑ | J ☑ | F ☑ |     |     |     |     |     
      | C ☐ | D ☑ | G ☑ | J ☑ | F ☑ |     |     |     |     |     
  (C) | C ☑ | D ☑ | G ☑ | J ☑ | F ☑ |     |     |     |     |     
      | C ☑ |     | G ☑ | J ☑ |     |     |     |     |     |     
      | C ☑ | I ☐ | G ☑ | J ☑ |     |     |     |     |     |     
  (I) | C ☑ | I ☑ | G ☑ | J ☑ |     |     |     |     |     |     
      | C ☑ | I ☑ | G ☑ |     |     |     |     |     |     |     
      | C ☑ | I ☑ | G ☑ | H ☐ |     |     |     |     |     |     
  (H) | C ☑ | I ☑ | G ☑ | H ☑ |     |     |     |     |     |     
      | C ☑ |     | G ☑ | H ☑ |     |     |     |     |     |     
      | C ☑ | B ☐ | G ☑ | H ☑ |     |     |     |     |     |     
  (B) | C ☑ | B ☑ | G ☑ | H ☑ |     |     |     |     |     |     
      |     | B ☑ | G ☑ |     |     |     |     |     |     |     
      | A ☐ | B ☑ | G ☑ |     |     |     |     |     |     |     
  (A) | A ☑ | B ☑ | G ☑ |     |     |     |     |     |     |     
      | A ☑ |     |     |     |     |     |     |     |     |

Caveats

This is not an optimal solution. In some cases I've been able to derive better orders of operation that use fewer resources. This could be a case of tweaking step #2 to pick a better instruction. I've had good success by picking the operation with the most dependencies (because we're working backwards through time, this is equivalent to saying "lets free from memory the one that has the most resources currently in memory"), but as this is a greedy approach, it isn't always globally the best.