The generated code is compatible with many CUDNN Versions.
Here, we installed different CUDNN versions (maintaining the CUDA Version to be
10.1 update 2) on a Tesla_V100-SXM2-16GB system.
All plots bellow show a slice of the heuristics within the
7.x CUDNN release cycle.
We examined the choice of heuristic on end-to-end latency and find that it can be significant. This is especially true for older architectures, where the heuristic has not been finely tuned
We compute the GeoMean across all the models and find that the hypothesis holds.
We find that the heuristics of CUDNN have not improved over the past few years…