How can we conclude that an optimization algorithm is better than another one

Question

When we test a new optimization algorithm, what the process that we need to do?For example, do we need to run the algorithm several times, and pick a best performance,i.e., in terms of accuracy, f1 score .etc, and do the same for an old optimization algorithm, or do we need to compute the average performance,i.e.,the average value of accuracy or f1 scores for these runs, to show that it is better than the old optimization algorithm? Because when I read the papers on a new optimization algorithm, I don't know how they calculate the performance and draw the train-loss vs iters curves, because it has random effects, and for different runs we may get different performance and different curves. So do we compare best performance or average performance?

score 2 · Answer 1 · answered Sep 23 '19 at 02:04

Whenever possible, you should use the average performance when comparing different methods, and preferably even mention the standard deviation across different runs (see this question for an example why it's important sometimes). It's perfectly fine to also provide the best performance, ideally you can even present a boxplot comparison of the different methods.

What is really unacceptable is to compare a best performance for one method against a mean performance for the other method (it should go without saying but I remember a paper where the authors were happily doing just that in order to make their method look better).

How can we conclude that an optimization algorithm is better than another one

1 Answers1