0
0
Apache Sparkdata~20 mins

Reading CSV files with options in Apache Spark - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
CSV Reading Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this Spark code reading a CSV with header option?

Consider the following Apache Spark code snippet that reads a CSV file with a header row:

df = spark.read.option("header", "true").csv("data.csv")
df.show()

The CSV file data.csv contains:

name,age
Alice,30
Bob,25

What will be the output of df.show()?

Apache Spark
df = spark.read.option("header", "true").csv("data.csv")
df.show()
A
+-----+---+
| name|age|
+-----+---+
| name|age|
|Alice| 30|
|  Bob| 25|
+-----+---+
B
+-----+---+
| _c0 |_c1|
+-----+---+
| name|age|
|Alice| 30|
|  Bob| 25|
+-----+---+
C
+-----+---+
| name|age|
+-----+---+
|Alice| 30|
|  Bob| 25|
+-----+---+
D
+-----+---+
| _c0 |_c1|
+-----+---+
|Alice| 30|
|  Bob| 25|
+-----+---+
Attempts:
2 left
💡 Hint

When header option is set to true, Spark uses the first row as column names.

Predict Output
intermediate
2:00remaining
What happens if you read a CSV with wrong delimiter option?

Given a CSV file data.csv with content:

name;age
Alice;30
Bob;25

What will be the output of this Spark code?

df = spark.read.option("header", "true").option("delimiter", ",").csv("data.csv")
df.show()
Apache Spark
df = spark.read.option("header", "true").option("delimiter", ",").csv("data.csv")
df.show()
AAnalysisException: CSV parsing error due to wrong delimiter
B
+-----+---+
| _c0 |_c1|
+-----+---+
|name;age|null|
|Alice;30|null|
|Bob;25  |null|
+-----+---+
C
+-----+---+
| name|age|
+-----+---+
|Alice| 30|
|  Bob| 25|
+-----+---+
D
+----------------+
|name;age         |
+----------------+
|Alice;30        |
|Bob;25          |
+----------------+
Attempts:
2 left
💡 Hint

If the delimiter option does not match the actual delimiter in the file, Spark treats the whole line as one column.

data_output
advanced
1:30remaining
How many rows are in the DataFrame after reading CSV with inferSchema option?

Consider a CSV file data.csv with content:

name,age
Alice,30
Bob,25
Charlie,35

What is the number of rows in the DataFrame after running:

df = spark.read.option("header", "true").option("inferSchema", "true").csv("data.csv")
row_count = df.count()
Apache Spark
df = spark.read.option("header", "true").option("inferSchema", "true").csv("data.csv")
row_count = df.count()
A2
B3
C4
D0
Attempts:
2 left
💡 Hint

The CSV has 3 data rows after the header.

🔧 Debug
advanced
1:30remaining
What error does this Spark CSV read code raise?

What error will this code raise?

df = spark.read.option("header", True).csv("data.csv")
df.show()

Note: The header option value is a boolean True, not a string.

Apache Spark
df = spark.read.option("header", True).csv("data.csv")
df.show()
ANo error, runs correctly showing data with header
BTypeError: option value must be a string
CAnalysisException: CSV file not found
DValueError: invalid option value for header
Attempts:
2 left
💡 Hint

Spark's option method accepts boolean values for boolean options like header.

🚀 Application
expert
2:30remaining
Which option correctly reads a CSV with multiline fields and tab delimiter?

You have a CSV file data.csv where fields can contain new lines and the delimiter is a tab character.

Which Spark code snippet correctly reads this CSV preserving multiline fields?

Adf = spark.read.option("header", "true").option("delimiter", "\t").option("multiLine", "true").csv("data.csv")
Bdf = spark.read.option("header", true).option("delimiter", "\t").csv("data.csv")
Cdf = spark.read.option("header", "true").option("delimiter", "tab").option("multiLine", "true").csv("data.csv")
Ddf = spark.read.option("header", "true").option("delimiter", "\t").csv("data.csv")
Attempts:
2 left
💡 Hint

Remember to set multiLine option to true as a string and use the correct tab character for delimiter.