7

On the 7th of June 2023, Google DeepMind released an article about AlphaDev. AlphaDev is an evolution of AlphaZero (used to beat world champions at Go, Chess and Shogi), and it can produce assembly code for a task if said task is represented as a game. All of this information can be found here.

Now, they tested AlphaDev on sorting algorithms and it discovered new algorithms as well as improved already existing ones. Here are the improvements made compared to the original code in assembly pseudo-code. The two algorithms improved are for sorting sequences of 3 and 4.

Can someone explain (or link to resources explaining) what concretely is different and why the specific modification in assembly code in the figure is an improvement to the sorting algorithm?

D.W.
  • 167,959
  • 22
  • 232
  • 500
nico263nico
  • 81
  • 1
  • 5

6 Answers6

23

This is not a new sorting algorithm. It's much more interesting than that. AlphaDev appears to have produced a new technique for superoptimisation.

You can think of the superoptimisation problem as trying to take a code fragment, and finding the optimal sequence of instructions that does the same thing. The idea goes back at least to the 1970s; tools like BONSAI would start with a description of machine code expressed as some kind of formal logic, and then use heuristic search to find optimal code sequences.

The term "superoptimisation" was introduced by Henry Massalin in 1987. His was the first system to use exhaustive search, but it was made practical by using a limited subset of machine instructions. For many years, GCC was tuned using superoptimisations discovered by GNU superopt.

Superoptimisation is useful for discovering instruction selection templates and peephole optimisations, but the huge search space makes full optimisation prohibitive. Modern algorithms tend to use stochastic search and goal-directed search to allow for more instructions and longer instruction sequences.

What the AlphaDev appears to have achieved is the start of a new way to get even longer instruction sequences: use deep learning techniques to propose sequences, which would then presumably be automatically verified using a theorem prover like Z3.

This is a very interesting approach. Forget sort algorithms, it could revolutionise the way we construct compiler back-ends. It won't be feasible to run an ANN plus theorem prover on every compilation any time soon, but compiler developers could take larger patterns found in real code and generate special-purpose optimisations for them.

Pseudonym
  • 24,523
  • 3
  • 48
  • 99
12

Check out Grady Booch's tweet on the matter:

there is no "new sorting algorithm" here.

Coming up with assembly level tricks is not the same as finding a new sorting algorithm. That would be my two cents on it.

ExpressionCoder
  • 622
  • 3
  • 15
6

The article says it. They focused on short sequences of $3$ to $5$ elements using the sorting network technique. They say they achieved $70\%$ speedup on these cases. But when you look closely to the detailed benchmarks, you see that $70\%$ is a unique extreme case, and speedups are more commonly like $+30\%$, $0\%$ and even down to $-35\%$.

Even more interesting is the fact that the speedup is of $+1.7\%$ on sequences of more than $250000$ elements (also best case ?). Last but not least, this "performance" is only available for sorting integers, a pretty ad-hoc case.

IMO, much fuss about nothing as regards the improvement of sorting algorithms, and even assembly optimization in general.

6

The improvement in the figure consists of the removal of one instruction. In a branch-less assembly program, this usually leads to a performance improvement. How much of a practical improvement it would be depends on how the subroutine is used.

Of course, the program should still be correct after this modification. The figure alone doesn't make this very clear, but this fragment describes only the final part of the implementation of a sorting network for 3 elements (the part inside the red ellipse). The first part guarantees that $B\leq C$, as noted in the paper. So, in the second part, instead of computing $\min(\min(A,C),B) = \min(A,B,C)$ for $P$ as in the program on the left, the program on the right computes $\min (A,B) = \min(A,B,C)$ (since $B\leq C$). This saves one assignment of $\min(A,C)$ to $P$ (initially, $P=A$), which is the instruction that is removed.

Discrete lizard
  • 8,392
  • 3
  • 25
  • 53
2

As written in other answers, the contributions of AlphaDev are:

  • A branch-less implementation of sort for short vectors (from which most of the speedup over std::sort comes)
  • A reduction in the number of instructions to be executed.

In the meantime, others have provided even shorter/faster instruction sequences than AlphaDev, either developed by hand, or using a combination of human work and machine-based tools. See:

https://export.arxiv.org/abs/2307.14503v1

https://www.mimicry.ai/faster-sorting-beyond-deepminds-alphadev

Disclaimer: I was involved with the work on the side of mimicry, so take that with a grain of salt.

The key achievement of AlphaDev is that it achieved its improvement largely without human input (depending on how you consider the needed setup of AlphaDev to solve this particular problem). It remains to be seen how quickly AI alone can beat humans in coming up with great code.

1

I can’t properly explain what the concrete differences are here or why they work. I’ll leave that to someone who is an expert on sorting algorithms. The comments in the pseudo-code though do explain somewhat tersely what each instruction is doing in high-level terms, so looking through those (and possibly experimenting in some language where you can step through the individual operations and watch the states of the variables) is probably a solid starting point for understanding how they work.

I can, however, rather succinctly explain why they are an improvement in just two words: fewer instructions.

The updated code uses fewer instructions than the original code does, and it does not use any different instructions than the original. This means, barring questionable speculative execution or caching behavior, that the new code will run faster on identical hardware when compared to the old code. The big thing that makes this a certainty is that the changes only remove operations instead of replacing them with different operations. IOW, timings for every other part of the code will remain (theoretically) unchanged, so the algorithm as a whole will run faster.