Challenge - 5 Problems

🎖️

CSV Reading Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

What is the output of this Spark code reading a CSV with header option?

Consider the following Apache Spark code snippet that reads a CSV file with a header row:

df = spark.read.option("header", "true").csv("data.csv")
df.show()

The CSV file data.csv contains:

name,age
Alice,30
Bob,25

What will be the output of df.show()?

Apache Spark

df = spark.read.option("header", "true").csv("data.csv")
df.show()

+-----+---+
| name|age|
+-----+---+
| name|age|
|Alice| 30|
|  Bob| 25|
+-----+---+

+-----+---+
| _c0 |_c1|
+-----+---+
| name|age|
|Alice| 30|
|  Bob| 25|
+-----+---+

+-----+---+
| name|age|
+-----+---+
|Alice| 30|
|  Bob| 25|
+-----+---+

+-----+---+
| _c0 |_c1|
+-----+---+
|Alice| 30|
|  Bob| 25|
+-----+---+

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

What happens if you read a CSV with wrong delimiter option?

Given a CSV file data.csv with content:

name;age
Alice;30
Bob;25

What will be the output of this Spark code?

df = spark.read.option("header", "true").option("delimiter", ",").csv("data.csv")
df.show()

Apache Spark

df = spark.read.option("header", "true").option("delimiter", ",").csv("data.csv")
df.show()

AAnalysisException: CSV parsing error due to wrong delimiter

+-----+---+
| _c0 |_c1|
+-----+---+
|name;age|null|
|Alice;30|null|
|Bob;25  |null|
+-----+---+

+-----+---+
| name|age|
+-----+---+
|Alice| 30|
|  Bob| 25|
+-----+---+

+----------------+
|name;age         |
+----------------+
|Alice;30        |
|Bob;25          |
+----------------+

Attempts:

2 left

❓ data_output

advanced

1:30remaining

How many rows are in the DataFrame after reading CSV with inferSchema option?

Consider a CSV file data.csv with content:

name,age
Alice,30
Bob,25
Charlie,35

What is the number of rows in the DataFrame after running:

df = spark.read.option("header", "true").option("inferSchema", "true").csv("data.csv")
row_count = df.count()

Apache Spark

df = spark.read.option("header", "true").option("inferSchema", "true").csv("data.csv")
row_count = df.count()

Attempts:

2 left

🔧 Debug

advanced

1:30remaining

What error does this Spark CSV read code raise?

What error will this code raise?

df = spark.read.option("header", True).csv("data.csv")
df.show()

Note: The header option value is a boolean True, not a string.

Apache Spark

df = spark.read.option("header", True).csv("data.csv")
df.show()

ANo error, runs correctly showing data with header

BTypeError: option value must be a string

CAnalysisException: CSV file not found

DValueError: invalid option value for header

Attempts:

2 left

🚀 Application

expert

2:30remaining

Which option correctly reads a CSV with multiline fields and tab delimiter?

You have a CSV file data.csv where fields can contain new lines and the delimiter is a tab character.

Which Spark code snippet correctly reads this CSV preserving multiline fields?

Adf = spark.read.option("header", "true").option("delimiter", "\t").option("multiLine", "true").csv("data.csv")

Bdf = spark.read.option("header", true).option("delimiter", "\t").csv("data.csv")

Cdf = spark.read.option("header", "true").option("delimiter", "tab").option("multiLine", "true").csv("data.csv")

Ddf = spark.read.option("header", "true").option("delimiter", "\t").csv("data.csv")

Attempts:

2 left