Spark Local Mode vs Cluster Mode

Spark supports multiple deployment configurations. The most fundamental distinction is between local mode — where everything runs in a single JVM on your machine — and cluster mode — where driver and executors are distributed across multiple machines in a cluster.

Local Mode

In local mode, Spark runs the driver and all executors within a single JVM process. No cluster manager is involved.

from pyspark.sql import SparkSession

# local — single thread (good for debugging)
spark = SparkSession.builder.master("local").appName("Test").getOrCreate()

# local[4] — 4 parallel threads
spark = SparkSession.builder.master("local[4]").appName("Test").getOrCreate()

# local[*] — use all available CPU cores
spark = SparkSession.builder.master("local[*]").appName("Test").getOrCreate()

Local mode is ideal for:

Unit testing Spark code
Developing and debugging transformations
Datasets that fit in a single machine’s memory
CI/CD pipeline tests

Cluster Mode

In cluster mode, Spark connects to a cluster manager (YARN, Kubernetes, Standalone) that allocates resources across multiple machines.

# Submit to YARN
spark-submit \
  --master yarn \
  --deploy-mode cluster \
  --num-executors 10 \
  --executor-cores 4 \
  --executor-memory 8g \
  --driver-memory 4g \
  my_pipeline.py

# Submit to Kubernetes
spark-submit \
  --master k8s://https://k8s-api:6443 \
  --deploy-mode cluster \
  --conf spark.kubernetes.container.image=my-spark:3.5 \
  my_pipeline.py

# Submit to Standalone cluster
spark-submit \
  --master spark://master-host:7077 \
  --deploy-mode cluster \
  my_pipeline.py

Client vs Cluster Deploy Mode

Within cluster submissions, there’s an additional distinction:

	`--deploy-mode client`	`--deploy-mode cluster`
Driver location	Submitting machine	Random worker node
Best for	Interactive notebooks, debugging	Production batch jobs
stdout/stderr	In your terminal	In cluster logs
Requires connectivity	While running	Only at submission
Network	Driver on client machine — may be far from data	Driver co-located with executors

Comparison Table

Aspect	Local Mode	Cluster Mode
Hardware	Single machine	Multiple machines
Fault tolerance	Limited (no task retry across machines)	Full (tasks retry on other nodes)
Scalability	Single machine	Thousands of cores
Data size	GB scale	TB to PB scale
Cluster manager	None	YARN / Kubernetes / Standalone
Spark UI	http://localhost:4040	Cluster-provided URL
Setup complexity	None	Cluster provisioning required

Writing Portable Code

import os
from pyspark.sql import SparkSession

# Read environment to choose master
MASTER = os.environ.get("SPARK_MASTER", "local[*]")

spark = SparkSession.builder \
    .appName("PortableApp") \
    .master(MASTER) \
    .getOrCreate()

# Dev: SPARK_MASTER=local[4] python my_app.py
# Prod: spark-submit --master yarn my_app.py