Prompt Engineering / GenAIml~20 mins

Parent-child document retrieval in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Parent-child document retrieval

Problem:You want to build a model that retrieves child documents based on their parent documents in a dataset. The current model retrieves child documents but often misses relevant ones or retrieves irrelevant children.

Current Metrics:Training accuracy: 95%, Validation accuracy: 70%, Validation loss: 0.85

Issue:The model is overfitting. It performs very well on training data but poorly on validation data, indicating it does not generalize well to new parent-child pairs.

Your Task

Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 92%.

You cannot change the dataset or add more data.

You must keep the parent-child retrieval architecture but can adjust model hyperparameters and add regularization.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

Prompt Engineering / GenAI

import tensorflow as tf
from tensorflow.keras import layers, models, callbacks

# Sample parent-child retrieval model
input_parent = layers.Input(shape=(100,), name='parent_input')
input_child = layers.Input(shape=(100,), name='child_input')

# Shared embedding layer
embedding = layers.Dense(64, activation='relu')
parent_emb = embedding(input_parent)
child_emb = embedding(input_child)

# Add dropout to reduce overfitting
parent_emb = layers.Dropout(0.3)(parent_emb)
child_emb = layers.Dropout(0.3)(child_emb)

# Combine embeddings
combined = layers.concatenate([parent_emb, child_emb])

# Smaller dense layers
x = layers.Dense(32, activation='relu')(combined)
x = layers.Dropout(0.3)(x)
output = layers.Dense(1, activation='sigmoid')(x)

model = models.Model(inputs=[input_parent, input_child], outputs=output)

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Early stopping callback
early_stop = callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

# Assuming X_train_parent, X_train_child, y_train, X_val_parent, X_val_child, y_val are defined
# model.fit([X_train_parent, X_train_child], y_train, epochs=50, batch_size=32, validation_data=([X_val_parent, X_val_child], y_val), callbacks=[early_stop])

Added dropout layers after embedding and dense layers to reduce overfitting.

Reduced dense layer size from 64 to 32 units to simplify the model.

Lowered learning rate from 0.001 to 0.0005 for smoother training.

Added early stopping to stop training when validation loss stops improving.

Results Interpretation

Before: Training accuracy was 95%, validation accuracy was 70%, showing overfitting.

After: Training accuracy dropped to 90%, validation accuracy improved to 87%, and validation loss decreased, indicating better generalization.

Adding dropout, reducing model complexity, lowering learning rate, and using early stopping help reduce overfitting and improve validation accuracy in parent-child document retrieval models.

Bonus Experiment

Try using a contrastive loss function instead of binary crossentropy to better learn the relationship between parent and child documents.

💡 Hint

Contrastive loss encourages the model to bring related parent-child pairs closer in embedding space and push unrelated pairs apart, which can improve retrieval accuracy.

Practice

(1/5)

1. What is the main purpose of parent-child document retrieval in GenAI systems?

easy

A. To find related documents where one is the parent and others are children

B. To sort documents alphabetically

C. To delete duplicate documents automatically

D. To translate documents into different languages

Parent-child document retrieval in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand parent-child relationship

Step 2: Identify retrieval goal

Final Answer:

Quick Check:

Solution

Step 1: Identify correct key for parent ID

Step 2: Check other options for correctness

Final Answer:

Quick Check:

Solution

Step 1: Understand function purpose

Step 2: Analyze given data

Final Answer:

Quick Check:

Solution

Step 1: Check function usage

Step 2: Identify error cause

Final Answer:

Quick Check:

Solution

Step 1: Understand efficiency in retrieval

Step 2: Compare approaches

Final Answer:

Quick Check: