A recurrent neural network has the capacity to have two inputs: a T-1 immediate output from itself and a given input. This means that the order of information given to a RNN is important. Because of their internal memory and time dependence, this network is preferred in speech recognition, language modeling, and translation.
The LSTM is an extension to RNNs. LSTM’s remember inputs over a long period of time. LSTM can learn from important experiences that have long time lags. Fundamentally, there are three gates: input, forget, and output. With analog gates, backpropagation solves the vanishing/exploding gradients because it keeps gradients effectively steep to keep training short and accurate.
See Simeon Kostadinov’s detailed explanation
of LSTM on Towards Data Science