Summary of this introductory blog post on word2vec.

Latent Semantic Analysis
- construct a matrix of wordOccurences-by-document
- convert the frequency into tf-idf notation
  - term-frequency-inverse-document-frequency
  - this helps normalize the frequency values and prevent the stop words (a, an, the) in dominating the matrix
- take SVD of this matrix and descending-order sort the singular values
- the corresponding col-vectors in U matrix will represent the words grouped closer according to the frequency of their occurence (or topic)
- this approach cannot predict subtle relationship with words
- certainly cannot understand the relationship across sequence of words
Word2vec
- converts the words to vectors such that the words neighboring in context to the current word all appear close in the vector-domain (according to a certain norm)
- skip-gram
  - predict a word given its surrounding context
  - pick a context of +/- 'c' words wrt a given word
  - maximize the log-probability of vector dot products (usually softmax is used)
  - this typically maintains distance between groups of words with similar meanings
    - eg: man->woman, king->queen, gentleman->lady, etc
  - this can also help predict similar words for a given word
- continuous bag of words
  - given a word predict its surrounding context