This is amazing. The fact that a high dimensional representation of more or less arbitrary semantics at the scale of a sentence is not just theoretically possible but actually practical in translation, is really exciting. We live in the future.
A new paper on machine translation, by +Oriol Vinyals and +Quoc Le and myself, on using large deep LSTMs to translate English to French by directly generating translations from the model. The LSTM maps the entire input sentence to a big vectors and then produces a translation from that vector. Our LSTM beats a good phrase-based baseline by 1.5 BLEU points on the entire test set (34.8 vs 33.3), where this performance measure penalizes our model on out-of-vocabulary words.
Surprisingly, the model "just works" on long sentences, because the LSTM's combined hidden state is very large (8k dimensions), and because we reversed the order of the words in the source sentence. By reversing the source sentences, we introduce many short term dependencies which make the optimization problem much easier for gradient descent. The final trick was to simply use a large vocabulary and to train the model for a long time.
Our results further confirm the "deep learning hypothesis": a big deep neural network can solve pretty much any problem, provided it has a very big high quality labelled training set. And if the results aren't good enough, it's because model is too small or because it didn't train properly.
The paper will be presented at NIPS 2014.
https://arxiv.org/abs/1409.3215
Surprisingly, the model "just works" on long sentences, because the LSTM's combined hidden state is very large (8k dimensions), and because we reversed the order of the words in the source sentence. By reversing the source sentences, we introduce many short term dependencies which make the optimization problem much easier for gradient descent. The final trick was to simply use a large vocabulary and to train the model for a long time.
Our results further confirm the "deep learning hypothesis": a big deep neural network can solve pretty much any problem, provided it has a very big high quality labelled training set. And if the results aren't good enough, it's because model is too small or because it didn't train properly.
The paper will be presented at NIPS 2014.
https://arxiv.org/abs/1409.3215
[1409.3215] Sequence to Sequence Learning with Neural Networks
Vic Gundotra - 2014-09-11 01:21:41-0400
We live in an exciting new age.