0
0
Prompt Engineering / GenAIml~12 mins

PII detection and redaction in Prompt Engineering / GenAI - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - PII detection and redaction

This pipeline detects personal information in text and replaces it with safe placeholders. It helps protect privacy by automatically finding and hiding sensitive data like names, emails, and phone numbers.

Data Flow - 5 Stages
1Input Text
1000 sentences x variable lengthRaw text containing personal information1000 sentences x variable length
"Hello, my name is Alice and my email is alice@example.com."
2Text Preprocessing
1000 sentences x variable lengthLowercase, remove extra spaces, tokenize sentences1000 sentences x tokens per sentence
["hello", ",", "my", "name", "is", "alice", "and", "my", "email", "is", "alice@example.com", "."]
3Feature Extraction
1000 sentences x tokensConvert tokens to word embeddings (numeric vectors)1000 sentences x tokens x 300 features
[[0.12, -0.05, ..., 0.33], ..., [0.45, 0.01, ..., -0.22]]
4Model Prediction
1000 sentences x tokens x 300 featuresNamed Entity Recognition model tags tokens as PII or not1000 sentences x tokens with PII tags
[('hello', 'O'), ('my', 'O'), ('name', 'O'), ('is', 'O'), ('alice', 'B-PER'), ('and', 'O'), ('my', 'O'), ('email', 'O'), ('is', 'O'), ('alice@example.com', 'B-EMAIL'), ('.', 'O')]
5Redaction
1000 sentences x tokens with PII tagsReplace PII tokens with placeholders1000 sentences x tokens with redacted text
"Hello, my name is [PERSON] and my email is [EMAIL]."
Training Trace - Epoch by Epoch

Loss
0.9 |***************
0.7 |************
0.5 |********
0.3 |*****
0.1 |**
    +----------------
     1 2 3 4 5 Epochs
EpochLoss ↓Accuracy ↑Observation
10.850.65Model starts learning to identify PII with moderate accuracy.
20.600.78Loss decreases and accuracy improves as model learns patterns.
30.450.85Model shows good ability to detect PII entities.
40.350.90Further improvement with more precise tagging.
50.300.92Model converges with high accuracy and low loss.
Prediction Trace - 5 Layers
Layer 1: Input Text
Layer 2: Text Preprocessing
Layer 3: Feature Extraction
Layer 4: Model Prediction
Layer 5: Redaction
Model Quiz - 3 Questions
Test your understanding
What happens to the text after the redaction stage?
AText is translated to another language
BPII is replaced with placeholders
CText is summarized
DText is converted to audio
Key Insight
This visualization shows how a model learns to find and hide personal information in text. The training improves the model's ability to tag sensitive data, enabling automatic redaction to protect privacy.