Roofline analysis
Summary
Slides of the talk can be found here
The Roofline Model
- realistic expectations of perf and productivity
- show inherent h/w limitations for a given kernel
- potential benefits and priority of optimizations
- computation is free, communication is expensive
principal components of performance
- they are:
- computation (Gflops/s is of main interest)
- communication (GBps is of main interest)
- locality (maximize locality to minimize communication)
- each architecture and kernel has different balance between these
arithmetic intensity
- relates computation with communication
- flops to bytes ratio
- to measure total bytes, all caches should be considered
roofline
- upper bound to perf is given by
- achieved flops/s = min(peak flops/s, streaming-BW * flops-to-bytes-ratio)
- assume complete overlap of computation with communication
- provides intuitive graph for kernel analysis and optimization