beginner

What is the purpose of using options when reading CSV files in Apache Spark?

Options help customize how Spark reads CSV files, such as setting the delimiter, header presence, and handling null values. This makes data reading flexible and accurate.

Click to reveal answer

beginner

How do you tell Spark that the CSV file has a header row?

You use the option "header" set to "true" like this: .option("header", "true"). This tells Spark to treat the first row as column names.

Click to reveal answer

intermediate

What does the option "inferSchema" do when reading a CSV file?

Setting .option("inferSchema", "true") tells Spark to automatically detect the data types of each column instead of treating all data as strings.

Click to reveal answer

beginner

How can you change the delimiter used in a CSV file when reading it with Spark?

Use the option "sep" with the delimiter character, for example: .option("sep", ";") to use semicolon as the separator instead of comma.

Click to reveal answer

beginner

What happens if you do not specify the "header" option when reading a CSV file in Spark?

Spark treats all rows as data, and column names will be default names like _c0, _c1, etc. The first row will not be used as headers.

Click to reveal answer

Which option tells Spark to treat the first row of a CSV file as column headers?

Aheader = true

BinferSchema = true

Csep = ","

DnullValue = "null"

What does setting inferSchema = true do when reading a CSV file?

AIt skips the first row

BIt changes the delimiter

CIt automatically detects column data types

DIt removes null values

How do you specify a semicolon as the delimiter in a CSV file read by Spark?

Aoption("delimiter", ";")

Boption("sep", ";")

Coption("header", ";")

Doption("inferSchema", ";")

If you do not set header = true, what will Spark do with the first row?

ATreat it as data

BSkip it

CUse it as column names

DThrow an error

Which option can you use to specify a string to treat as null in CSV data?

Asep

Bheader

CinferSchema

DnullValue

Explain how to read a CSV file in Spark that has a header row, uses semicolon as delimiter, and requires automatic schema detection.

What are the effects of not setting the header option when reading CSV files in Spark?