We use options to tell Spark exactly how to read a CSV file. This helps Spark understand the file better and get the data right.
Reading CSV files with options in Apache Spark
spark.read.option("option_name", "option_value").csv("file_path")
You can chain multiple option() calls to set different options.
Common options include header, sep, inferSchema, and mode.
df = spark.read.option("header", "true").csv("data.csv")
df = spark.read.option("sep", ";").option("header", "true").csv("data.csv")
df = spark.read.option("inferSchema", "true").csv("data.csv")
df = spark.read.option("mode", "DROPMALFORMED").csv("data.csv")
This program reads a CSV file named example.csv that uses semicolons to separate values and has a header row. It also asks Spark to guess the data types of each column. Then it prints the schema and the data.
from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSVOptionsExample").getOrCreate() # Read CSV with header, semicolon separator, and infer schema file_path = "example.csv" df = spark.read.option("header", "true")\ .option("sep", ";")\ .option("inferSchema", "true")\ .csv(file_path) # Show the data print("DataFrame schema:") df.printSchema() print("DataFrame content:") df.show() spark.stop()
Setting header to true tells Spark to use the first row as column names.
The inferSchema option can slow down reading for big files but gives correct data types.
The mode option controls how Spark handles bad lines: PERMISSIVE (default), DROPMALFORMED, or FAILFAST.
Use options to customize how Spark reads CSV files.
Common options include header, sep, inferSchema, and mode.
Setting options correctly helps Spark read data accurately and avoid errors.