Recall & Review
beginner
What is the purpose of using options when reading CSV files in Apache Spark?
Options help customize how Spark reads CSV files, such as setting the delimiter, header presence, and handling null values. This makes data reading flexible and accurate.
Click to reveal answer
beginner
How do you tell Spark that the CSV file has a header row?
You use the option "header" set to "true" like this: .option("header", "true"). This tells Spark to treat the first row as column names.
Click to reveal answer
intermediate
What does the option "inferSchema" do when reading a CSV file?
Setting .option("inferSchema", "true") tells Spark to automatically detect the data types of each column instead of treating all data as strings.
Click to reveal answer
beginner
How can you change the delimiter used in a CSV file when reading it with Spark?
Use the option "sep" with the delimiter character, for example: .option("sep", ";") to use semicolon as the separator instead of comma.
Click to reveal answer
beginner
What happens if you do not specify the "header" option when reading a CSV file in Spark?
Spark treats all rows as data, and column names will be default names like _c0, _c1, etc. The first row will not be used as headers.
Click to reveal answer
Which option tells Spark to treat the first row of a CSV file as column headers?
✗ Incorrect
The option header = true tells Spark to use the first row as column names.
What does setting inferSchema = true do when reading a CSV file?
✗ Incorrect
inferSchema = true makes Spark detect the data types of columns automatically.
How do you specify a semicolon as the delimiter in a CSV file read by Spark?
✗ Incorrect
The option sep sets the delimiter character, so sep = ";" uses semicolon.
If you do not set header = true, what will Spark do with the first row?
✗ Incorrect
Without header = true, Spark treats the first row as normal data.
Which option can you use to specify a string to treat as null in CSV data?
✗ Incorrect
nullValue option lets you specify which string should be treated as null.
Explain how to read a CSV file in Spark that has a header row, uses semicolon as delimiter, and requires automatic schema detection.
Think about the options needed for header, delimiter, and schema.
You got /4 concepts.
What are the effects of not setting the header option when reading CSV files in Spark?
Consider how Spark names columns and treats rows by default.
You got /3 concepts.