We use different modes to run Spark depending on the size of data and resources. Local mode runs on one computer, cluster mode uses many computers together.
0
0
Local mode vs cluster mode in Apache Spark
Introduction
Testing small data or code on your own laptop.
Learning Spark without needing a big setup.
Running quick jobs that don't need much power.
Processing large data that needs many computers.
Sharing work across a team using a cluster.
Syntax
Apache Spark
spark = SparkSession.builder.master("local[*]").appName("MyApp").getOrCreate() # For cluster mode, master URL changes, e.g., spark = SparkSession.builder.master("spark://master-url:7077").appName("MyApp").getOrCreate()
local[*] means use all cores on your computer.
Cluster mode needs the address of the cluster master to connect.
Examples
Runs Spark locally using 1 CPU core.
Apache Spark
spark = SparkSession.builder.master("local[1]").appName("TestApp").getOrCreate()
Runs Spark locally using all available CPU cores.
Apache Spark
spark = SparkSession.builder.master("local[*]").appName("TestApp").getOrCreate()
Connects to a Spark cluster at the given IP address and port.
Apache Spark
spark = SparkSession.builder.master("spark://192.168.1.100:7077").appName("ClusterApp").getOrCreate()
Sample Program
This code runs Spark locally on your computer using all CPU cores. It creates a small table of fruits and shows it. Then it prints the mode Spark is running in.
Apache Spark
from pyspark.sql import SparkSession # Create Spark session in local mode using all cores spark = SparkSession.builder.master("local[*]").appName("LocalVsClusterDemo").getOrCreate() # Create a simple DataFrame data = [(1, "apple"), (2, "banana"), (3, "cherry")] columns = ["id", "fruit"] df = spark.createDataFrame(data, columns) # Show the DataFrame print("DataFrame content:") df.show() # Print Spark master URL to confirm mode print(f"Running Spark in mode: {spark.sparkContext.master}") spark.stop()
OutputSuccess
Important Notes
Local mode is great for learning and small tasks but can't handle big data well.
Cluster mode needs setup of multiple machines but can process large data fast.
Always check the master setting to know where your Spark job runs.
Summary
Local mode runs Spark on one computer, good for small data and testing.
Cluster mode runs Spark on many computers, needed for big data.
You choose mode by setting the master parameter when creating Spark session.