Consider the following Apache Spark code snippet that reads a CSV file with a header row:
df = spark.read.option("header", "true").csv("data.csv")
df.show()The CSV file data.csv contains:
name,age Alice,30 Bob,25
What will be the output of df.show()?
df = spark.read.option("header", "true").csv("data.csv") df.show()
When header option is set to true, Spark uses the first row as column names.
Setting header=true tells Spark to treat the first row as column names. So the columns are name and age. The data rows follow.
Given a CSV file data.csv with content:
name;age Alice;30 Bob;25
What will be the output of this Spark code?
df = spark.read.option("header", "true").option("delimiter", ",").csv("data.csv")
df.show()df = spark.read.option("header", "true").option("delimiter", ",").csv("data.csv") df.show()
If the delimiter option does not match the actual delimiter in the file, Spark treats the whole line as one column.
The CSV file uses semicolon ; as delimiter but the code sets delimiter to comma ,. Spark reads each line as a single string column named 'name;age'. So the output shows one column with the whole line.
Consider a CSV file data.csv with content:
name,age Alice,30 Bob,25 Charlie,35
What is the number of rows in the DataFrame after running:
df = spark.read.option("header", "true").option("inferSchema", "true").csv("data.csv")
row_count = df.count()df = spark.read.option("header", "true").option("inferSchema", "true").csv("data.csv") row_count = df.count()
The CSV has 3 data rows after the header.
The header row is used as column names and not counted as data. The file has 3 data rows, so count() returns 3.
What error will this code raise?
df = spark.read.option("header", True).csv("data.csv")
df.show()Note: The header option value is a boolean True, not a string.
df = spark.read.option("header", True).csv("data.csv") df.show()
Spark's option method accepts boolean values for boolean options like header.
The option method accepts both strings and booleans for options like header. Passing a boolean True works correctly.
You have a CSV file data.csv where fields can contain new lines and the delimiter is a tab character.
Which Spark code snippet correctly reads this CSV preserving multiline fields?
Remember to set multiLine option to true as a string and use the correct tab character for delimiter.
Option A correctly sets header and delimiter as tab character \t, and enables multiLine as string true. Option A misses multiLine option, so multiline fields won't be handled properly. Option A uses string "tab" which is invalid delimiter. Option A misses multiLine option.