evaluation metrics for multiple values per session

Question

I have an application that executes my foo() function several times for each user session. There are 2 alternate algorithms that I can implement as foo() function and my goal is to evaluate them based on execution delay.

The number of times foo() is called per user session is variable but will not exceed 10000. Say delay values are:

Algo1: [ [12, 30, 20, 40, 24, 280], [13, 14, 15, 100], [20, 40] ]
Algo2: [ [1, 10, 5, 4, 150, 20], [14, 10, 20], [21, 33, 41, 79] ]

I'm contemplating the following options to pick the best algorithm.

average from each session, and then evaluate cdf
median from each session and then evaluate cdf

Which one is the best metric to pick the winner? Or is there another method that is better than both?

score 0 · Answer 1 · edited Jun 30 '24 at 00:25

Here is a suggestion:

Standardise everything (if you omit this then some big number like $9999$ can ruin everything), then take average value per user session. Then, optionally, multiply this number by $x/10$ for example, where $x$ is the sample size in the user session (think of it like evidence where more samples add more confidence) and finally average by number of sessions for the algorithm.

score 0 · Answer 2 · answered Oct 23 '20 at 13:02

0

It is common to look at 90th or 99th percentile latency in computer systems.

A user won't notice the difference between a couple of milliseconds of lag but if a function occasionally takes several seconds that is very noticeable.

answered Oct 23 '20 at 13:02

Brian Spiering

23,131
2
29
113

evaluation metrics for multiple values per session

2 Answers2