Accelerating weighted random sampling without replacement
Summary
Main paper can be found here.
- there are multiple approaches given here
- the simplest in terms of cuda is the one-pass sampling
- generate exponentially distributed random numbers
- scale them wrt the input weights of the data to be sampled
- sort this scaled array
- pick the items corresponding to the indices of the top-k items in this sorted array
- the issue is that this thing cannot be directly used for permuting arrays!