Proposal

Main paper can be found here.

  • novel GNN based method for text classification
  • graph is constructed using word co-occurence and word-doc associations
  • learn both word and doc (aka corpus) embeddings simultaneously by building and using the above mentioned heterogenous text graph, where nodes are both words and docs

Summary

  • text graph construction
    • based on word co-occurence (in corpus) and doc-word relations
    • number of nodes = vocab size + corpus size
    • input embeddings are just one-hot vectors
    • edge weights
      • doc to word = TFIDF
      • word to word = $$PMI(i, j)$$ only if $$PMI(i, j)$$ is positive and words i and j occur in the sliding window together
      • self-loops are added
      • $$PMI(i, j) = log\frac{P_{ij}}{P_i P_j}$$
      • $$P_{ij} = \frac{nW(i, j)}{nW}$$
      • $$P(i) = \frac{nW(i)}{nW}$$
      • $$nW$$ = total number of sliding windows
  • model and training
    • usage of spectral GCNs as in Kipf and Welling
    • 2 layer GCNs
    • output layer is psased through softmax classifier
    • loss function is cross entropy error over all labelled docs
      • $$L = - \Sigma_d \Sigma_f Y_{df} ln(Z_{df})$$
      • $$d$$ = all labelled docs
      • $$f$$ = all output features
      • $$Y$$ = label vector for a doc
      • $$Z$$ = softmax output vector
    • 2 layers help information exchange between docs too!
    • more layers didn't help improving accuracy
    • sliding window size = 20 words
    • first layer output embedding size = 200
    • learning rate = 0.02
    • Adam optimizer with 200 epochs
    • early stopping after 10 epochs with no decrease in validation loss
    • dropout rate = 0.5
    • 10% of training as validation set