Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What does PII stand for in data privacy?
PII stands for Personally Identifiable Information. It includes any data that can identify a specific person, like names, addresses, or phone numbers.
Click to reveal answer
beginner
Why is PII detection important in machine learning?
PII detection helps protect people's privacy by finding sensitive information in data so it can be removed or hidden before using the data for training or analysis.
Click to reveal answer
beginner
What is redaction in the context of PII?
Redaction means removing or hiding sensitive information like PII from text or data to prevent unauthorized access or sharing.
Click to reveal answer
intermediate
Name two common methods used for PII detection.
Two common methods are: 1. Rule-based detection using patterns like regular expressions. 2. Machine learning models trained to recognize PII in text.
Click to reveal answer
intermediate
How can machine learning models help improve PII redaction?
Machine learning models can learn from examples to detect PII more accurately, even when the data is messy or uses different formats, making redaction more reliable.
Click to reveal answer
What type of information is considered PII?
AA recipe for cake
BThe weather forecast
CA company's stock price
DA person's email address
✗ Incorrect
A person's email address is personal information that can identify them, so it is PII.
Which method can be used to detect PII in text?
ACalculating averages
BSorting numbers
CUsing regular expressions
DDrawing charts
✗ Incorrect
Regular expressions can find patterns like phone numbers or emails to detect PII.
What does redaction do to PII in documents?
AHighlights it in bright colors
BRemoves or hides it
CCopies it to another file
DPrints it in bold
✗ Incorrect
Redaction removes or hides PII to protect privacy.
Why use machine learning for PII detection?
ATo improve detection accuracy on varied data
BTo slow down processing
CTo detect PII only in numbers
DTo replace all data with zeros
✗ Incorrect
Machine learning helps detect PII accurately even when data formats vary.
Which of these is NOT an example of PII?
ACompany's annual revenue
BSocial Security Number
CHome address
DPhone number
✗ Incorrect
Company's revenue is not personal information about an individual.
Explain what PII detection and redaction mean and why they are important.
Think about how personal data is found and hidden to keep people safe.
You got /4 concepts.
Describe two ways machine learning can help with PII detection and redaction.
Consider how computers learn patterns to find sensitive info.
You got /4 concepts.
Practice
(1/5)
1. What is the main purpose of PII detection in text data?
easy
A. To increase the size of the dataset
B. To improve the speed of text processing
C. To find personal information to protect privacy
D. To translate text into different languages
Solution
Step 1: Understand PII detection
PII detection is about finding personal information like names, emails, or phone numbers in text.
Step 2: Identify the purpose
The goal is to protect privacy by recognizing sensitive data that should not be shared openly.
Final Answer:
To find personal information to protect privacy -> Option C
Quick Check:
PII detection = find personal info [OK]
Hint: PII detection means finding personal info to keep it safe [OK]
Common Mistakes:
Confusing PII detection with data translation
Thinking it speeds up processing
Believing it increases dataset size
2. Which of the following is the correct way to redact an email address in text?
easy
A. Replace the email with <EMAIL_REDACTED>
B. Delete the entire sentence containing the email
C. Change the email to a random number
D. Highlight the email in bold
Solution
Step 1: Understand redaction
Redaction means hiding sensitive info by replacing it with a placeholder, not deleting or changing it randomly.
Step 2: Choose the correct method
Replacing the email with a clear placeholder like <EMAIL_REDACTED> keeps the text readable and safe.
Final Answer:
Replace the email with <EMAIL_REDACTED> -> Option A
Quick Check:
Redaction = replace sensitive info with placeholder [OK]
Hint: Redact by replacing sensitive info with clear placeholders [OK]
Common Mistakes:
Deleting whole sentences instead of redacting
Replacing emails with unrelated data
Highlighting instead of hiding
3. Given this Python code snippet for PII redaction:
import re
text = 'Contact me at john.doe@example.com or 123-456-7890.'
redacted = re.sub(r'\S+@\S+\.\S+', '<EMAIL_REDACTED>', text)
print(redacted)
What will be the output?
medium
A. Contact me at john.doe@example.com or 123-456-7890.
B. Contact me at john.doe@example.com or <EMAIL_REDACTED>.
C. Contact me at <EMAIL_REDACTED> or <EMAIL_REDACTED>.
D. Contact me at <EMAIL_REDACTED> or 123-456-7890.
The code replaces the email with '<EMAIL_REDACTED>' but leaves the phone number unchanged.
Final Answer:
Contact me at <EMAIL_REDACTED> or 123-456-7890. -> Option D
Quick Check:
Email replaced, phone unchanged = Contact me at <EMAIL_REDACTED> or 123-456-7890. [OK]
Hint: Regex replaces emails only, phone stays same [OK]
Common Mistakes:
Thinking phone number is replaced
Misreading regex pattern
Assuming no replacement happens
4. You wrote this code to redact phone numbers:
import re
text = 'Call 555-1234 or 555-5678.'
redacted = re.sub(r'\d{3}-\d{4}', '<PHONE_REDACTED>', text)
print(redacted)
But the output is: 'Call 555-1234 or 555-5678.' What is the likely error?
medium
A. The regex pattern is incorrect and does not match the phone numbers
B. The re.sub function is missing the text argument
C. The print statement is missing parentheses
D. The text variable is empty
Solution
Step 1: Check regex pattern against phone format
The pattern '\d{3}-\d{4}' matches numbers like '555-1234', but the phone numbers might have different formats or extra spaces.
Step 2: Confirm if pattern matches text
If the phone numbers have area codes or spaces, the pattern won't match, so no replacement occurs.
Final Answer:
The regex pattern is incorrect and does not match the phone numbers -> Option A
Quick Check:
Regex mismatch causes no replacement [OK]
Hint: Check regex matches exact phone format in text [OK]
Common Mistakes:
Assuming re.sub syntax error
Forgetting parentheses in print (Python 3+)
Thinking text is empty without checking
5. You want to redact both emails and phone numbers in a text using Python. Which combined regex pattern correctly matches emails and US phone numbers like '123-456-7890'?
hard
A. r'\d{3}-\d{4}|\S+@\S+\.\S+'
B. r'\S+@\S+\.\S+|\d{3}-\d{3}-\d{4}'
C. r'\S+@\S+\.\S+\d{3}-\d{3}-\d{4}'
D. r'\S+@\S+\.\S+&\d{3}-\d{3}-\d{4}'
Solution
Step 1: Understand regex for emails and phones
The email pattern '\S+@\S+\.\S+' matches emails; '\d{3}-\d{3}-\d{4}' matches US phone numbers like '123-456-7890'.
Step 2: Combine patterns with OR operator
Using '|' between patterns matches either emails or phone numbers separately.
Step 3: Evaluate options
r'\S+@\S+\.\S+|\d{3}-\d{3}-\d{4}' correctly uses '|' to combine patterns; r'\d{3}-\d{4}|\S+@\S+\.\S+' reverses order but still works; r'\S+@\S+\.\S+\d{3}-\d{3}-\d{4}' concatenates patterns (wrong); r'\S+@\S+\.\S+&\d{3}-\d{3}-\d{4}' uses '&' which is invalid in regex.
Final Answer:
r'\S+@\S+\.\S+|\d{3}-\d{3}-\d{4}' -> Option B
Quick Check:
Use '|' to combine regex patterns [OK]
Hint: Use '|' to combine email and phone regex patterns [OK]