Proposal

Main paper can be found here.

  • Generalization of LSTMs to tree structured network topologies, with arbitrary branching factor
  • demonstrate better results than "linear" LSTMs on SemEval and SSTB datasets

Tree LSTMs

  • "linear" LSTMs compute their current hidden state based on previous hidden state and current input
  • tree LSTMs compute their current hidden state from current input and hidden states of children. There will be one forget gate for each child and only the leaf nodes get to take in the input vectors

Child tree LSTMs

  • useful for unordered children
  • eg: dependency trees
  • C(j)C(j) = children of j-th node
  • σ\sigma = sigmoid function
  • xjx_j inputs to the j-th node
  • kϵC(j)k \epsilon C(j)

hj=khkh_j' = \sum_k h_k

ij=σ(W(i)xj+U(i)hj+b(i))i_j = \sigma(W^{(i)}x_j + U^{(i)}h_j' + b^{(i)})

fjk=σ(W(f)xj+U(f)hk+b(f))f_{jk} = \sigma(W^{(f)}x_j + U^{(f)}h_k + b^{(f)})

oj=σ(W(o)xj+U(o)hj+b(o))o_j = \sigma(W^{(o)}x_j + U^{(o)}h_j' + b^{(o)})

uj=tanh(W(u)xj+U(u)hj+b(u))u_j = tanh(W^{(u)}x_j + U^{(u)}h_j' + b^{(u)})

cj=ijuj+kfjkckc_j = i_j \odot u_j + \sum_k f_{jk} c_k

hj=ojtanh(cj)h_j = o_j \odot tanh(c_j)

N-ary tree LSTMs

  • useful when branching factor is atmost N
  • has finer grained control over forget gates from each child node
  • child nodes are expected to be ordered

ij=σ(W(i)xj+l=1NUl(i)hjl+b(i))i_j = \sigma(W^{(i)}x_j + \sum_{l=1}^N U_l^{(i)}h_{jl} + b^{(i)})

fjk=σ(W(f)xj+l=1NUkl(f)hjl+b(i))f_{jk} = \sigma(W^{(f)}x_j + \sum_{l=1}^N U_{kl}^{(f)}h_{jl} + b^{(i)})

oj=σ(W(o)xj+l=1NUl(o)hjl+b(o))o_j = \sigma(W^{(o)}x_j + \sum_{l=1}^N U_l^{(o)}h_{jl} + b^{(o)})

uj=tanh(W(u)xj+l=1NUl(u)hjl+b(u))u_j = tanh(W^{(u)}x_j + \sum_{l=1}^N U_l^{(u)}h_{jl} + b^{(u)})

cj=ijuj+l=1Nfjlcjlc_j = i_j \odot u_j + \sum_{l=1}^N f_{jl} \odot c_{jl}

hj=ojtanh(cj)h_j = o_j \odot tanh(c_j)