Parallelizing word2vec in multi-core and many-core archs
Summary of this paper on parallelizing word2vec model training on Intel CPUs.
- convert level-1 blas to level-3 by using
- shared negative samples
- group multiple input contexts words for a given target word
- for scaling to multi-nodes
- model update frequency is tied to word frequency
- reduce starting learning rate as the number of nodes
- m-weighted sampling updates