How should I analyse the results of my experiments with respect to their hyper-parameters?

by nbro   Last Updated January 11, 2019 12:19 PM

I trained (and tested) a CNN with the same architecture on the same task (MNIST). However, I trained it several times: one for each combination of hyper-parameters. I call each of these training (and test) processes an "experiment". The hyper-parameters that are variable are the optimizer (e.g. Adam or SGD), the learning rate of the optimizer, the batch size, etc.

I have performed 36 experiments, so 36 combination of (some of) the hyper-parameters. I would like now to compare the results against the hyper-parameters. In general, I am interested in knowing or predicing the relations between the hyper-parameters of each experiment and the corresponding results (test accuracy and test loss). Which statistical tools would you advise me to use in this case?

For example, some questions that I may be interested in answering:

  1. How does the learning rate affect the results (test accuracy or loss)?
  2. Is there any correlation between the batch size and the test loss?
  3. Can we infer which set of hyper-parameters produce a better test accuracy for a particular optimizer?

Note that I am aware of techniques like finding the correlation between two variables, but they usually have to be of the same dimension. I'm looking for answers which give concrete advices and not just "find the correlation between the batch size and the test accuracy": I would like to know how I can do it: e.g. "Suppose that you have a list (of size N) of the batch sizes, one batch size for each experiment (and you have N experiments), then you should do X and Y".

Note that I can use some visualization tools to answer my questions, but I would like to have more concrete answers and not just "As the learning rate increases, it seems like the test accuracy improves for some models".

Related Questions