How to Configure Spark Memory in PySpark for Optimal Performance
To configure Spark memory in PySpark, set memory-related properties like
spark.executor.memory and spark.driver.memory using SparkConf or when creating a SparkSession. These settings control how much memory is allocated to executors and the driver, helping optimize performance.Syntax
You configure Spark memory settings by specifying properties in SparkConf or directly in SparkSession.builder.config(). The main properties are:
spark.executor.memory: Memory allocated per executor process.spark.driver.memory: Memory allocated for the driver program.spark.memory.fraction: Fraction of JVM heap used for execution and storage.
python
from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName("MemoryConfigExample") \ .config("spark.executor.memory", "2g") \ .config("spark.driver.memory", "1g") \ .config("spark.memory.fraction", "0.6") \ .getOrCreate()
Example
This example shows how to create a SparkSession with custom memory settings and print the configured values.
python
from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName("MemoryConfigExample") \ .config("spark.executor.memory", "3g") \ .config("spark.driver.memory", "2g") \ .getOrCreate() conf = spark.sparkContext.getConf() executor_mem = conf.get("spark.executor.memory") driver_mem = conf.get("spark.driver.memory") print(f"Executor Memory: {executor_mem}") print(f"Driver Memory: {driver_mem}") spark.stop()
Output
Executor Memory: 3g
Driver Memory: 2g
Common Pitfalls
Common mistakes when configuring Spark memory include:
- Setting memory too low, causing frequent garbage collection or out-of-memory errors.
- Setting memory too high, which can cause the system to swap or crash.
- Forgetting to configure both
executoranddrivermemory separately. - Not considering the total cluster resources available.
Always monitor your Spark application's memory usage and adjust accordingly.
python
from pyspark.sql import SparkSession # Wrong: Setting executor memory too low spark_wrong = SparkSession.builder \ .appName("WrongMemory") \ .config("spark.executor.memory", "256m") \ .getOrCreate() # Right: Setting executor memory to a reasonable value spark_right = SparkSession.builder \ .appName("RightMemory") \ .config("spark.executor.memory", "4g") \ .getOrCreate()
Quick Reference
| Property | Description | Example Value |
|---|---|---|
| spark.executor.memory | Memory per executor process | 2g |
| spark.driver.memory | Memory for the driver program | 1g |
| spark.memory.fraction | Fraction of JVM heap for execution/storage | 0.6 |
| spark.memory.storageFraction | Fraction of memory fraction for storage | 0.5 |
Key Takeaways
Set
spark.executor.memory and spark.driver.memory to control memory allocation.Avoid setting memory too low or too high to prevent errors and performance issues.
Use
SparkSession.builder.config() to apply memory settings in PySpark.Monitor your Spark application's memory usage and adjust configurations as needed.
Remember to consider total cluster resources when tuning memory settings.