Technology  /  Apache Spark

Apache Spark 49 guides · updated 2026

Distributed data processing at scale — RDDs, DataFrames, Structured Streaming, and the tuning techniques that keep Spark jobs fast and cheap.

Spark Local Mode vs Cluster Mode

Spark supports multiple deployment configurations. The most fundamental distinction is between local mode — where everything runs in a single JVM on your machine — and cluster mode — where driver and executors are distributed across multiple machines in a cluster.


Local Mode

In local mode, Spark runs the driver and all executors within a single JVM process. No cluster manager is involved.

from pyspark.sql import SparkSession
# local — single thread (good for debugging)
spark = SparkSession.builder.master("local").appName("Test").getOrCreate()
# local[4] — 4 parallel threads
spark = SparkSession.builder.master("local[4]").appName("Test").getOrCreate()
# local[*] — use all available CPU cores
spark = SparkSession.builder.master("local[*]").appName("Test").getOrCreate()

Local mode is ideal for:


Cluster Mode

In cluster mode, Spark connects to a cluster manager (YARN, Kubernetes, Standalone) that allocates resources across multiple machines.

Terminal window
# Submit to YARN
spark-submit \
--master yarn \
--deploy-mode cluster \
--num-executors 10 \
--executor-cores 4 \
--executor-memory 8g \
--driver-memory 4g \
my_pipeline.py
# Submit to Kubernetes
spark-submit \
--master k8s://https://k8s-api:6443 \
--deploy-mode cluster \
--conf spark.kubernetes.container.image=my-spark:3.5 \
my_pipeline.py
# Submit to Standalone cluster
spark-submit \
--master spark://master-host:7077 \
--deploy-mode cluster \
my_pipeline.py

Client vs Cluster Deploy Mode

Within cluster submissions, there’s an additional distinction:

--deploy-mode client--deploy-mode cluster
Driver locationSubmitting machineRandom worker node
Best forInteractive notebooks, debuggingProduction batch jobs
stdout/stderrIn your terminalIn cluster logs
Requires connectivityWhile runningOnly at submission
NetworkDriver on client machine — may be far from dataDriver co-located with executors

Comparison Table

AspectLocal ModeCluster Mode
HardwareSingle machineMultiple machines
Fault toleranceLimited (no task retry across machines)Full (tasks retry on other nodes)
ScalabilitySingle machineThousands of cores
Data sizeGB scaleTB to PB scale
Cluster managerNoneYARN / Kubernetes / Standalone
Spark UIhttp://localhost:4040Cluster-provided URL
Setup complexityNoneCluster provisioning required

Writing Portable Code

import os
from pyspark.sql import SparkSession
# Read environment to choose master
MASTER = os.environ.get("SPARK_MASTER", "local[*]")
spark = SparkSession.builder \
.appName("PortableApp") \
.master(MASTER) \
.getOrCreate()
# Dev: SPARK_MASTER=local[4] python my_app.py
# Prod: spark-submit --master yarn my_app.py