Bird
Raised Fist0
Prompt Engineering / GenAIml~20 mins

Hierarchical chunking in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Hierarchical chunking
Problem:You want to build a text classification model that understands long documents by breaking them into smaller parts (chunks) and then combining the information hierarchically.
Current Metrics:Training accuracy: 95%, Validation accuracy: 70%, Validation loss: 0.85
Issue:The model overfits the training data and performs poorly on validation data because it does not effectively capture hierarchical structure in long texts.
Your Task
Reduce overfitting and improve validation accuracy to above 80% by implementing hierarchical chunking in the model.
You must keep the same dataset and base model architecture (e.g., LSTM or Transformer).
You cannot increase the training data size.
You should not reduce the model capacity drastically.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Prompt Engineering / GenAI
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense, Dropout, TimeDistributed, Bidirectional
from tensorflow.keras.callbacks import EarlyStopping

# Parameters
max_chunks = 5  # number of chunks per document
chunk_size = 100  # words per chunk
embedding_dim = 50
lstm_units = 64
num_classes = 3

# Dummy data generation (for example)
# X shape: (num_samples, max_chunks, chunk_size, embedding_dim)
num_samples = 1000
X_train = np.random.rand(num_samples, max_chunks, chunk_size, embedding_dim).astype(np.float32)
y_train = tf.keras.utils.to_categorical(np.random.randint(0, num_classes, num_samples), num_classes)
X_val = np.random.rand(200, max_chunks, chunk_size, embedding_dim).astype(np.float32)
y_val = tf.keras.utils.to_categorical(np.random.randint(0, num_classes, 200), num_classes)

# Model definition
# Input shape: (max_chunks, chunk_size, embedding_dim)
input_layer = Input(shape=(max_chunks, chunk_size, embedding_dim))

# Encode each chunk with a shared LSTM
chunk_encoder = TimeDistributed(Bidirectional(LSTM(lstm_units, return_sequences=False)))(input_layer)
chunk_encoder = Dropout(0.3)(chunk_encoder)

# Combine chunk encodings with another LSTM
hierarchical_lstm = Bidirectional(LSTM(lstm_units, return_sequences=False))(chunk_encoder)
hierarchical_lstm = Dropout(0.3)(hierarchical_lstm)

# Output layer
output_layer = Dense(num_classes, activation='softmax')(hierarchical_lstm)

model = Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Early stopping to prevent overfitting
early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

# Train model
history = model.fit(X_train, y_train, epochs=30, batch_size=32, validation_data=(X_val, y_val), callbacks=[early_stop])
Split input documents into fixed number of chunks with fixed chunk size.
Used TimeDistributed layer to encode each chunk separately with a shared Bidirectional LSTM.
Added a second Bidirectional LSTM to combine chunk-level encodings hierarchically.
Added dropout layers after each LSTM to reduce overfitting.
Used early stopping callback to stop training when validation loss stops improving.
Results Interpretation

Before: Training accuracy was very high (95%) but validation accuracy was low (70%), showing overfitting.

After: Training accuracy decreased slightly to 88%, but validation accuracy improved to 82%, and validation loss decreased, indicating better generalization.

Hierarchical chunking helps the model understand long documents better by processing smaller parts first and then combining their information. Adding dropout and early stopping reduces overfitting and improves validation performance.
Bonus Experiment
Try replacing the LSTM layers with Transformer encoder layers for chunk encoding and hierarchical combination.
💡 Hint
Use multi-head self-attention layers and positional encoding to capture relationships within and between chunks.

Practice

(1/5)
1. What is the main purpose of hierarchical chunking in AI?
easy
A. To break large data into smaller, organized parts
B. To increase the size of data chunks randomly
C. To remove all data except the first part
D. To combine all data into one big chunk

Solution

  1. Step 1: Understand hierarchical chunking

    Hierarchical chunking means splitting big data into smaller, meaningful parts.
  2. Step 2: Identify the purpose

    This helps AI handle complex information better by organizing it clearly.
  3. Final Answer:

    To break large data into smaller, organized parts -> Option A
  4. Quick Check:

    Hierarchical chunking = breaking data into parts [OK]
Hint: Think 'big to small organized parts' for hierarchical chunking [OK]
Common Mistakes:
  • Confusing chunking with random splitting
  • Thinking it removes data instead of organizing
  • Believing it merges all data into one
2. Which of the following is the correct way to represent hierarchical chunking in code?
easy
A. chunks = [chunk for chunk in data if len(chunk) > 0]
B. chunks = data.split()
C. chunks = [[subchunk for subchunk in chunk] for chunk in data]
D. chunks = data + data

Solution

  1. Step 1: Understand hierarchical chunking code

    Hierarchical chunking means splitting data into chunks, then subchunks inside each chunk.
  2. Step 2: Identify correct nested list comprehension

    chunks = [[subchunk for subchunk in chunk] for chunk in data] shows nested comprehension, matching hierarchical chunking structure.
  3. Final Answer:

    chunks = [[subchunk for subchunk in chunk] for chunk in data] -> Option C
  4. Quick Check:

    Nested lists = hierarchical chunks [OK]
Hint: Look for nested loops to represent hierarchy [OK]
Common Mistakes:
  • Using single-level split instead of nested
  • Concatenating data instead of chunking
  • Filtering chunks without hierarchy
3. Given the code below, what is the output?
data = [["a", "b"], ["c", "d"]]
chunks = [[item.upper() for item in chunk] for chunk in data]
print(chunks)
medium
A. [["A", "B"], ["C", "D"]]
B. ["a", "b", "c", "d"]
C. [["a", "b"], ["c", "d"]]
D. ["A", "B", "C", "D"]

Solution

  1. Step 1: Analyze the nested list comprehension

    Each chunk is a list; for each item, .upper() converts letters to uppercase.
  2. Step 2: Apply transformation to each item

    "a" -> "A", "b" -> "B", "c" -> "C", "d" -> "D"; structure remains nested.
  3. Final Answer:

    [["A", "B"], ["C", "D"]] -> Option A
  4. Quick Check:

    Nested uppercase conversion = [["A", "B"], ["C", "D"]] [OK]
Hint: Uppercase inside nested loops keeps structure [OK]
Common Mistakes:
  • Flattening list instead of keeping nested
  • Not applying .upper() to each item
  • Confusing output with original data
4. Find the error in this hierarchical chunking code:
data = [[1, 2], [3, 4]]
chunks = [item * 2 for chunk in data]
print(chunks)
medium
A. Using wrong operator for multiplication
B. print statement syntax error
C. Data should be a flat list, not nested
D. Missing inner loop to access items inside chunks

Solution

  1. Step 1: Check list comprehension structure

    The code loops over 'chunk' but uses 'item' without defining it inside the loop.
  2. Step 2: Identify missing inner loop

    To access items inside each chunk, an inner loop is needed to multiply each item.
  3. Final Answer:

    Missing inner loop to access items inside chunks -> Option D
  4. Quick Check:

    Nested data needs nested loops [OK]
Hint: Remember: nested data needs nested loops [OK]
Common Mistakes:
  • Using undefined variable 'item'
  • Assuming flat list instead of nested
  • Ignoring indentation or syntax errors
5. You have a long document split into paragraphs, sentences, and words. How would hierarchical chunking help an AI model process this document?
hard
A. By merging all words into one long string to simplify processing
B. By organizing the document into paragraphs, then sentences, then words for better understanding
C. By ignoring sentence boundaries and treating paragraphs as single units
D. By randomly splitting words without structure

Solution

  1. Step 1: Understand document structure

    The document has layers: paragraphs contain sentences, sentences contain words.
  2. Step 2: Apply hierarchical chunking concept

    Hierarchical chunking breaks data into layers matching this structure for clearer AI processing.
  3. Step 3: Identify correct approach

    Organizing by paragraphs, sentences, then words helps AI understand context and meaning better.
  4. Final Answer:

    By organizing the document into paragraphs, then sentences, then words for better understanding -> Option B
  5. Quick Check:

    Hierarchical chunking = layered data organization [OK]
Hint: Match chunking layers to document layers [OK]
Common Mistakes:
  • Flattening all words into one string
  • Ignoring sentence boundaries
  • Random splitting without order