Proposal
Main paper can be found here.
- Generalization of LSTMs to tree structured network topologies, with arbitrary
branching factor
- demonstrate better results than "linear" LSTMs on SemEval and SSTB datasets
Tree LSTMs
- "linear" LSTMs compute their current hidden state based on previous hidden
state and current input
- tree LSTMs compute their current hidden state from current input and hidden
states of children. There will be one forget gate for each child and only the
leaf nodes get to take in the input vectors
Child tree LSTMs
- useful for unordered children
- eg: dependency trees
- C(j) = children of j-th node
- σ = sigmoid function
- xj inputs to the j-th node
- kϵC(j)
hj′=∑khk
ij=σ(W(i)xj+U(i)hj′+b(i))
fjk=σ(W(f)xj+U(f)hk+b(f))
oj=σ(W(o)xj+U(o)hj′+b(o))
uj=tanh(W(u)xj+U(u)hj′+b(u))
cj=ij⊙uj+∑kfjkck
hj=oj⊙tanh(cj)
N-ary tree LSTMs
- useful when branching factor is atmost N
- has finer grained control over forget gates from each child node
- child nodes are expected to be ordered
ij=σ(W(i)xj+∑l=1NUl(i)hjl+b(i))
fjk=σ(W(f)xj+∑l=1NUkl(f)hjl+b(i))
oj=σ(W(o)xj+∑l=1NUl(o)hjl+b(o))
uj=tanh(W(u)xj+∑l=1NUl(u)hjl+b(u))
cj=ij⊙uj+∑l=1Nfjl⊙cjl
hj=oj⊙tanh(cj)