- Given a time-series of data collected from a single running process that takes 8 hours to complete:
| Minute | GB of Disk Space Used |
|---|---|
| 0 | 0 |
| 1 | 8 |
| 2 | 15 |
| 3 | 22 |
...Etc. It is sampled every minute, for 8 hours. The # of GB used jumps up and down a lot, and ends back at 0, like a big spiky mountain shape. I'm making a (large) assumption that running multiple instances of this process in parallel takes up proportionally more space, but doesn't slow eachother down.
- I've also got a hard upper bound that the summed area graph (the total of all space usage at any given minute) can't go above.
My personal upper-bound is 780GB (the amount of spare disk space), and if I ran 53 processes in parallel starting at the same time, they would all add up at minute 2 to (53 * 15) = 795 (which is > 780), the disk would run out of space, and everything would crash. Or if I ran 35 in parallel, they would crash at minute 3. No crashing allowed!
- The goal is complete as many runs of the process as possible within the next 7 days, by "packing them in" and running them in some overlapping manner by deciding when each run of the process starts.
First Attempt: Greedy
At first I thought it was a simple greedy problem: At every minute, ask "can you start a new process without the total of all running processes exceeding the limit in the future?" If yes, great! start a new one. If not, wait a minute and try again. This leads to "start 3 processes immediately. Wait until they all finish. Then do it again for 7 days."
Second Attempt: recursive?
Done! wait... that feels wrong, backpack-packing style. What about "If you line up runs A and B just right, they make a valley you could fit C in to."
Which makes it feel like a recursive solve problem. Is there an accepted way to optimize this sort of thing?
Third Attempt: Minimize the Maximum (like Tetris?)
My best guess at this point is a simple optimization strategy:
- Define a time bound (7 days)
- Place the next process start time at whatever is the minimizes the "max space used on the drive, over all time"
- If there are ties, move the graph to the soonest start point.
- Repeat #2 until nothing more fits.
This would work out to:
- First process starts immediately.
- Second process starts before the first one finishes: Whenever is in the downslope of the first one exceeds the upslope of the second one.
- Keep adding process start times until there isn't a place to slot in one more.