This blog is aiming to describe in detail about Bert and all the extraordinary tricks it uses. We will split the blog into two parts:
LSTM is dead. Long Live Transformers!
This is a very interesting title of an excellent youtube video talking about transformers.
When training with RNN, we can only do computation entry by entry. Thus, it becomes impossible for us to use GPU to improve its training time. …