Roofline analysis

Published: 2018-03-20
Last Modifed: 2021-07-15

Summary

Slides of the talk can be found here

The Roofline Model

realistic expectations of perf and productivity
show inherent h/w limitations for a given kernel
potential benefits and priority of optimizations
computation is free, communication is expensive

principal components of performance

they are:
- computation (Gflops/s is of main interest)
- communication (GBps is of main interest)
- locality (maximize locality to minimize communication)
each architecture and kernel has different balance between these

arithmetic intensity

relates computation with communication
flops to bytes ratio
to measure total bytes, all caches should be considered

roofline

upper bound to perf is given by
- achieved flops/s = min(peak flops/s, streaming-BW * flops-to-bytes-ratio)
- assume complete overlap of computation with communication
provides intuitive graph for kernel analysis and optimization