Summary

Main paper can be found here.

  • there are multiple approaches given here
  • the simplest in terms of cuda is the one-pass sampling
    • generate exponentially distributed random numbers
    • scale them wrt the input weights of the data to be sampled
    • sort this scaled array
    • pick the items corresponding to the indices of the top-k items in this sorted array
  • the issue is that this thing cannot be directly used for permuting arrays!