Learning Library

← Back to Library

LSTMs: Solving RNN Memory Limits

Key Points

  • LSTMs (Long Short‑Term Memory networks) are designed to keep useful context while discarding irrelevant information, mimicking how human short‑term memory works in tasks like solving a murder‑mystery clue sequence.
  • By examining an entire sequence (e.g., letters or words), an LSTM can infer patterns such as “my name is …” that aren’t obvious from isolated elements.
  • An LSTM is a special kind of recurrent neural network (RNN) where each node’s output is fed back as part of the input for the next step, allowing the network to retain information across time steps.
  • Standard RNNs suffer from the long‑term dependency problem—performance degrades as the sequence length grows—whereas LSTMs mitigate this issue through gated mechanisms that control what to remember and what to forget.

Full Transcript

# LSTMs: Solving RNN Memory Limits **Source:** [https://www.youtube.com/watch?v=b61DPVFX03I](https://www.youtube.com/watch?v=b61DPVFX03I) **Duration:** 00:08:19 ## Summary - LSTMs (Long Short‑Term Memory networks) are designed to keep useful context while discarding irrelevant information, mimicking how human short‑term memory works in tasks like solving a murder‑mystery clue sequence. - By examining an entire sequence (e.g., letters or words), an LSTM can infer patterns such as “my name is …” that aren’t obvious from isolated elements. - An LSTM is a special kind of recurrent neural network (RNN) where each node’s output is fed back as part of the input for the next step, allowing the network to retain information across time steps. - Standard RNNs suffer from the long‑term dependency problem—performance degrades as the sequence length grows—whereas LSTMs mitigate this issue through gated mechanisms that control what to remember and what to forget. ## Sections - [00:00:00](https://www.youtube.com/watch?v=b61DPVFX03I&t=0s) **Memory Limits and LSTM Analogy** - The speaker uses a murder‑mystery dinner scenario and extreme memory‑capacity examples to illustrate how LSTMs selectively retain relevant context while discarding irrelevant information for accurate sequence prediction. ## Full Transcript
0:00imagine you're at a murder mystery 0:03dinner 0:04right at the start the lord of the manor 0:06abruptly kills over and your task is to 0:09figure out 0:11who done it 0:13it could be the maid 0:15it could be the butler 0:17but you've got a problem your short-term 0:19memory isn't working so well you can't 0:22remember any of the clues past the last 0:2410 minutes well in that sort of 0:26situation your prediction is going to be 0:28well nothing better than just a random 0:30guess 0:31or imagine you have the opposite problem 0:33where you can remember 0:35every word of every conversation that 0:37you've ever had if somebody asked you to 0:40outline your partner's wedding vows well 0:44you might have some trouble doing that 0:46there's just so many words that you'd 0:48need to process be much better than if 0:51you could just remember 0:52well the 0:54memorable stuff 0:55and that's where something called 0:59long 1:00short 1:02term 1:03memory 1:05comes into play 1:07also abbreviated as lstm 1:12it allows a neural network to remember 1:15the stuff that it needs to keep hold of 1:17context but also to forget the stuff 1:20that well is no longer applicable 1:24so take for example this sequence of 1:27letters 1:31we need to predict what the next letter 1:34in the sequence is going to be 1:36well just by looking at the letters 1:38individually it's not obvious what the 1:41next sequence is like we have two m's 1:45and they both have a different letter 1:46following them 1:47so how do we predict the sequence well 1:49if we have gone back through the time 1:52series to look at all of the letters in 1:53the sequence we can establish context 1:55and we can clearly see oh yes it's my 1:57name is 1:59and if we instead of looking at letters 2:01looked at words we can establish that 2:04the whole sentence here says my name is 2:06oh yes martin 2:10now a 2:11recurrent neural network is really where 2:14an lstm lives so effectively in lstm is 2:17a type of recurrent neural network 2:21recurrent 2:22neural net 2:24and recurrent neural networks 2:29work 2:29in the sense that they have a node so 2:33there's a node 2:34here and this node receives some input 2:39so we've got some input 2:41coming in 2:44that input is then processed in some way 2:46so there's some kind of computation and 2:48that results in an output that's pretty 2:51standard stuff but what makes an rnn 2:55node a little bit different is the fact 2:57that it is 2:58recurrent 2:59and that means that it loops around so 3:03the output 3:04of a given step 3:06is provided alongside the input in the 3:09next step 3:10so step one has some input it's 3:13processed and that results in some 3:15output then step two has some new input 3:18but it also receives the output of the 3:21prior step as well that is what makes an 3:24rnn a little bit different and it allows 3:26it to remember previous steps in a 3:28sequence so when we're looking at a 3:31sentence like my name i we don't have to 3:33go back too far through those steps to 3:36figure out what the context is 3:38but rnn does suffer from what's known as 3:41the long term dependency problem which 3:44is to say that over time as more and 3:46more information piles up 3:49then rnn's become less effective at 3:51learning new things 3:53so while we didn't have to go too far 3:55back for my name i if we were going back 3:58through an hour's worth of clues at our 3:59murder mystery dinner well that's a lot 4:01more information that needs to be 4:03processed 4:05so 4:06the 4:07lstm 4:09provides a solution to this long term 4:12dependency problem and that is to add 4:15something called an internal state 4:19to the rnn node 4:22now when an rnn input comes in 4:26it is receiving the state information as 4:29well 4:30so a step receives the output from the 4:33previous step 4:34the input of the new step and also 4:38some state information 4:40from the lstm state 4:42now what is this state well it's 4:46actually a cell let's take a look at 4:48what's in there 4:49so this is an lstm cell 4:52and it consists of three parts 4:57each part is a gate there is a forget 5:01gate 5:03there's an input gate 5:06and there's an output gate 5:10now the 5:12forget gate 5:14says what sort of state information 5:16that's stored in this internal state 5:18here 5:19can be forgotten it's no longer 5:21contextually relevant 5:23the input gate says what new information 5:27should we add or update into this 5:29working storage state information and 5:31the output gate says of all the 5:33information that's stored in that state 5:35which part of it should be output in 5:38this particular instance 5:40and these gates can be assigned numbers 5:42between zero and one 5:44where zero 5:46means that the gate is effectively 5:47closed and nothing gets through and one 5:50means the gate is wide open and 5:52everything gets through 5:54so we can say forget everything or just 5:56forget a little bit we can say add 5:59everything to the input state or add 6:00just a little bit and we can say output 6:02everything or just output a little bit 6:04or output nothing at all 6:06so now when we're processing 6:09in our rnn cell we have this additional 6:12state information that can provide us 6:13with some additional context 6:16so if we take an example of another 6:19sentence like 6:21martin 6:21[Music] 6:22is 6:24buying apples 6:27there's some information that we might 6:28want to store 6:30in this state 6:33martin is most likely to derive to the 6:36gender of males so we might want to 6:37store that because that might be useful 6:40apples is a plural so maybe we're going 6:42to store that it is a plural for later 6:45on 6:46now as this sentence continues to 6:48develop it now starts to talk about 6:50jennifer 6:51jennifer is 6:54at this point we can make some changes 6:57to our state data so we've changed 7:00subjects from martin to jennifer so we 7:02don't care about the gender of martin 7:03anymore so we can forget that part 7:06and we can say the most likely gender 7:08for jennifer is female and store that 7:11instead 7:12and really that is 7:14how we can apply this lstm to any sort 7:17of series where we have a sequence 7:19prediction that's required and some long 7:22term dependency data to go alongside of 7:24it 7:25now some some typical use cases for 7:28using lstm machine translation is a good 7:32one 7:34and 7:35another one are chat bots so q a chat 7:38bots 7:39where we might need to retrieve some 7:41information that was in a previous step 7:45in that chat bot and recall it later on 7:48yeah all good examples of where we have 7:50a time sequence of things and some long 7:53term dependencies and 7:55had we also applied lstm to our murder 7:58mystery dinner we probably could have 7:59won first prize by having it forecast to 8:01us that whodunit was the butler 8:06so is the butler 8:08if you have any questions please drop us 8:10a line below and if you want to see more 8:13videos like this in the future please 8:15consider liking and subscribing