Summary of this paper on parallelizing word2vec model training on Intel CPUs.

convert level-1 blas to level-3 by using
- shared negative samples
- group multiple input contexts words for a given target word
for scaling to multi-nodes
- model update frequency is tied to word frequency
- reduce starting learning rate as the number of nodes
- m-weighted sampling updates