Sequence to Sequence learning using neural networks
Summary
Main paper can be found here.
- general architecture
- encoder = deep LSTMs to convert input-seq into a fixed-dim vector
- decoder = deep LSTMs to decode this fixed-dim vector into output-seq
- contributions
- reverse the input seq!
- this way, the temporal dependency between the input/output seq's will be closer, thereby improving the accuracy
- encoder/decoder LSTMs
- negligible increase in computation cost
- generalizes and thus enables multiple-language translation
- reverse the input seq!
- training
- 8-GPU machine
- each GPU works on a layer of LSTM (4 layers)
- remaining 4 GPUs work on the output softmax
- minibatch size = 128
- this model could decode very long sequences as well!
- claim: reversing the input-seq can make even RNNs to do seq2seq
- this work outperforms a mature SMT system!