What is nn.LSTM layer in PyTorch?

An LSTM layer helps a model remember important information from sequences, like sentences or time series, so it can make better predictions.

nn.LSTM layer in PyTorch - Syntax, Examples & Explanation

Practice

(1/5)

1. What is the primary purpose of the nn.LSTM layer in PyTorch?

easy

A. To process and remember information from sequences over time

B. To perform image classification using convolution

C. To reduce the dimensionality of data using PCA

D. To generate random numbers for initialization

Solution

Step 1: Understand the role of LSTM
LSTM stands for Long Short-Term Memory, a type of recurrent neural network layer designed to handle sequence data and remember information over time.
Step 2: Match purpose with options
Among the options, only processing and remembering sequence information matches the LSTM's purpose.
Final Answer:
To process and remember information from sequences over time -> Option A
Quick Check:
LSTM purpose = sequence memory [OK]

Hint: LSTM = sequence memory layer, not image or random [OK]

Common Mistakes:

Confusing LSTM with convolutional layers
Thinking LSTM reduces data dimension like PCA
Assuming LSTM generates random numbers

2. Which of the following is the correct way to create an LSTM layer in PyTorch with input size 10 and hidden size 20?

easy

A. nn.LSTM(input=10, hidden=20)

B. nn.LSTM(20, 10)

C. nn.LSTM(10, 20)

D. nn.LSTM(hidden_size=10, input_size=20)

Solution

Step 1: Recall nn.LSTM constructor parameters
The first argument is input_size (features per input), the second is hidden_size (features in hidden state).
Step 2: Match correct syntax
nn.LSTM(10, 20) uses nn.LSTM(10, 20) which correctly sets input_size=10 and hidden_size=20.
Final Answer:
nn.LSTM(10, 20) -> Option C
Quick Check:
Constructor order = input_size, hidden_size [OK]

Hint: First arg input size, second hidden size in nn.LSTM() [OK]

Common Mistakes:

Swapping input_size and hidden_size
Using wrong keyword arguments
Confusing parameter names

3. Given the code below, what is the shape of output after running the LSTM?

import torch
import torch.nn as nn
lstm = nn.LSTM(input_size=5, hidden_size=3, num_layers=1)
inputs = torch.randn(4, 2, 5)  # seq_len=4, batch=2, input_size=5
output, (hn, cn) = lstm(inputs)

medium

A. (4, 2, 3)

B. (2, 4, 3)

C. (4, 3, 2)

D. (2, 3, 4)

Solution

Step 1: Understand LSTM input and output shapes
The input shape is (seq_len, batch, input_size). The output shape is (seq_len, batch, hidden_size).
Step 2: Apply given dimensions
Input shape is (4, 2, 5), hidden_size=3, so output shape is (4, 2, 3).
Final Answer:
(4, 2, 3) -> Option A
Quick Check:
Output shape = (seq_len, batch, hidden_size) [OK]

Hint: Output shape matches (seq_len, batch, hidden_size) [OK]

Common Mistakes:

Mixing batch and sequence dimensions
Confusing input_size with hidden_size
Assuming output shape swaps batch and seq_len

4. What is wrong with this code snippet that tries to create an LSTM layer?

import torch.nn as nn
lstm = nn.LSTM(10)

medium

A. The input size must be a tuple, not an integer

B. It misses the hidden_size argument, causing an error

C. LSTM requires a batch size argument at creation

D. The code is correct and runs without error

Solution

Step 1: Check nn.LSTM constructor requirements
nn.LSTM requires at least two positional arguments: input_size and hidden_size.
Step 2: Identify missing argument
The code only provides input_size=10, missing hidden_size, so it will raise a TypeError.
Final Answer:
It misses the hidden_size argument, causing an error -> Option B
Quick Check:
nn.LSTM needs input_size and hidden_size [OK]

Hint: nn.LSTM needs two sizes: input and hidden [OK]

Common Mistakes:

Thinking batch size is needed at layer creation
Assuming input_size can be a tuple
Believing code runs without error

5. You want to build a model that processes sequences of length 6 with 8 features each. You want the LSTM to output a sequence with 12 features per time step. Which of the following LSTM layer initializations is correct to achieve this?

hard

A. nn.LSTM(input_size=12, hidden_size=8)

B. nn.LSTM(input_size=8, hidden_size=6)

C. nn.LSTM(input_size=6, hidden_size=8)

D. nn.LSTM(input_size=8, hidden_size=12)

Solution

Step 1: Identify input_size and hidden_size meanings
input_size is the number of features per time step in the input sequence. hidden_size is the number of features in the output per time step.
Step 2: Match given sequence and desired output
Input sequences have 8 features, so input_size=8. Desired output features per time step is 12, so hidden_size=12.
Final Answer:
nn.LSTM(input_size=8, hidden_size=12) -> Option D
Quick Check:
Input features = 8, output features = 12 [OK]

Hint: Input size = input features, hidden size = output features [OK]

Common Mistakes:

Confusing sequence length with input_size
Swapping input_size and hidden_size
Using sequence length as hidden_size

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of LSTM

Step 2: Match purpose with options

Final Answer:

Quick Check:

Solution

Step 1: Recall nn.LSTM constructor parameters

Step 2: Match correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand LSTM input and output shapes

Step 2: Apply given dimensions

Final Answer:

Quick Check:

Solution

Step 1: Check nn.LSTM constructor requirements

Step 2: Identify missing argument

Final Answer:

Quick Check:

Solution

Step 1: Identify input_size and hidden_size meanings

Step 2: Match given sequence and desired output

Final Answer:

Quick Check: