Apache Spark
- Apache Spark: Big Data Processing & Analytics
- Spark DataFrames: Features, Use Cases & Optimization for Big Data
- Spark Architecture
- Dataframe create from file
- Dataframe Pyspark create from collections
- Spark Dataframe save as csv
- Dataframe save as parquet
- Dataframe show() between take() methods
- Apache SparkSession
- Understanding the RDD of Apache Spark
- Spark RDD creation from collection
- Different method to print data from rdd
- Practical use of unionByName method
- Creating Spark DataFrames: Methods & Examples
- Setup Spark in PyCharm
- Apache Spark all APIs
- Spark for the word count program
- Spark Accumulators
- aggregateByKey in Apache Spark
- Spark Broadcast with Examples
- Spark combineByKey
- Apache Spark Using countByKey
- Spark CrossJoin know all
- Optimizing Spark groupByKey: Usage, Best Practices, and Examples
- Mastering Spark Joins: Inner, Outer, Left, Right & Semi Joins Explained
- Apache Spark: Local Mode vs Cluster Mode - Key Differences & Examples
- Spark map vs flatMap: Key Differences with Examples
- Efficient Data Processing with Spark mapPartitionsWithIndex
- Spark reduceByKey with 5 Real-World Examples
- Spark Union vs UnionAll vs Union Available – Key Differences & Examples
Spark CrossJoin join explained
Crossjoin help us to find cartesian of tow data sets. We will understand more regarding the cross join in this article.
In below example we have two dataset and with the help of Crossjoin join method will get the final output.
from pyspark.sql import SparkSession
# Create a Spark session
spark = SparkSession.builder \
.appName("CrossJoinExample") \
.getOrCreate()
# Sample data for two DataFrames
data1 = [("Narender", 1000), ("John", 2000)]
data2 = [("India", 100), ("UK", 200)]
columns1 = ["Name", "salary"]
columns2 = ["address", "pincode"]
df1 = spark.createDataFrame(data1, columns1)
df2 = spark.createDataFrame(data2, columns2)
# Perform a cross join
cross_joined_df = df1.crossJoin(df2)
# Show the result
cross_joined_df.show()
# Stop the Spark session
spark.stop()
The output of above program are below.
#Output :
#+--------+------+-------+-------+
#| Name|salary|address|pincode|
#+--------+------+-------+-------+
#|Narender| 1000| India| 100|
#|Narender| 1000| UK| 200|
#| John| 2000| India| 100|
#| John| 2000| UK| 200|
#+--------+------+-------+-------+