Statistical assessment of experimental results: a graphical approach for comparing algorithms
Abstract
Non-deterministic measurements are common in real-world scenarios: the performance of a stochastic optimization algorithm or the total reward of a reinforcement learning agent in a chaotic environment are just two examples in which unpredictable outcomes are common. These measures can be modeled as random variables and compared among each other via their expected values or more sophisticated tools such as null hypothesis statistical tests. In this paper, we propose an alternative framework to compare two random variables according to their cumulative distribution functions. First, we introduce a dominance measure for two random variables that quantifies the proportion in which the cumulative distribution function of one of the random variables is greater than the other. Then, we present a graphical method that allows a visual estimation of the proposed dominance measure, the probability that one of the random variables takes lower values than the other, and a comparison of quantiles of the random variables. With illustrative purposes, we re-evaluate the experimentation of an already published work with the proposed methodology and we show that additional conclusions—missed by the rest of the methods—can be inferred. Additionally, a software package is provided as a convenient way of applying the proposed framework.