Challenge - 5 Problems

🎖️

Encoding Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

What is the output of this code when reading a CSV with encoding errors?

Consider a CSV file with some invalid UTF-8 bytes. What will be the output of the following code snippet?

Pandas

import pandas as pd
from io import BytesIO

bytes_data = b'name,age\nAlice,30\nBob,25\nJos\xe9,22'

try:
    df = pd.read_csv(BytesIO(bytes_data), encoding='utf-8')
    result = df.to_dict()
except Exception as e:
    result = str(type(e))

A{'name': {0: 'Alice', 1: 'Bob', 2: 'José'}, 'age': {0: 30, 1: 25, 2: 22}}

B{'name': {0: 'Alice', 1: 'Bob', 2: 'Jos\xE9'}, 'age': {0: 30, 1: 25, 2: 22}}

C<class 'ParserError'>

D<class 'UnicodeDecodeError'>

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

What is the content of the DataFrame after reading with errors='replace'?

Given the same CSV data with invalid UTF-8 bytes, what will be the DataFrame content if we use errors='replace' in read_csv?

Pandas

import pandas as pd
from io import BytesIO

bytes_data = b'name,age\nAlice,30\nBob,25\nJos\xe9,22'

df = pd.read_csv(BytesIO(bytes_data), encoding='utf-8', encoding_errors='replace')
result = df.to_dict()

A{'name': {0: 'Alice', 1: 'Bob', 2: 'Jos�'}, 'age': {0: 30, 1: 25, 2: 22}}

B{'name': {0: 'Alice', 1: 'Bob', 2: 'Jos\xE9'}, 'age': {0: 30, 1: 25, 2: 22}}

C<class 'UnicodeDecodeError'>

D{'name': {0: 'Alice', 1: 'Bob', 2: 'José'}, 'age': {0: 30, 1: 25, 2: 22}}

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Why does this code raise a UnicodeDecodeError?

Examine the code below. Why does it raise a UnicodeDecodeError?

Pandas

import pandas as pd
from io import BytesIO

bytes_data = b'name,age\nAlice,30\nBob,25\nJos\xe9,22'

try:
    df = pd.read_csv(BytesIO(bytes_data), encoding='utf-8')
    result = df.to_dict()
except Exception as e:
    result = str(type(e))

ABecause the BytesIO object is not supported by pandas

BBecause the byte \xe9 is not valid UTF-8 on its own

CBecause the CSV header is missing

DBecause the data contains a missing value

Attempts:

2 left

🚀 Application

advanced

2:00remaining

How to correctly read a CSV with Latin-1 encoding containing special characters?

You have a CSV file encoded in Latin-1 with names containing accented characters. Which code snippet correctly reads it into a DataFrame preserving the characters?

Apd.read_csv('file.csv', encoding='latin-1')

Bpd.read_csv('file.csv', encoding='utf-8')

Cpd.read_csv('file.csv', encoding='ascii')

Dpd.read_csv('file.csv', encoding='utf-16')

Attempts:

2 left

🧠 Conceptual

expert

2:00remaining

What is the effect of using encoding='utf-8-sig' when reading a CSV file?

When reading a CSV file that starts with a Byte Order Mark (BOM), what does specifying encoding='utf-8-sig' do?

AIt causes a UnicodeDecodeError because BOM is not supported

BIt treats the file as ASCII encoding ignoring UTF-8 characters

CIt removes the BOM from the start of the file so it does not appear in the data

DIt converts all characters to uppercase

Attempts:

2 left