The LSTM is an extension to RNNs. LSTM’s remember inputs over a long period of time. LSTM can learn from important experiences that have long time lags. Fundamentally, there are three gates: input, forget, and output. With analog gates, backpropagation solves the vanishing/exploding gradients because it keeps gradients effectively steep to keep training short and accurate.