Spark CrossJoin join explained

Crossjoin help us to find cartesian of tow data sets. We will understand more regarding the cross join in this article.

In below example we have two dataset and with the help of Crossjoin join method will get the final output.

 
    from pyspark.sql import SparkSession

    # Create a Spark session
    spark = SparkSession.builder \
        .appName("CrossJoinExample") \
        .getOrCreate()

    # Sample data for two DataFrames
    data1 = [("Narender", 1000), ("John", 2000)]
    data2 = [("India", 100), ("UK", 200)]

    columns1 = ["Name", "salary"]
    columns2 = ["address", "pincode"]

    df1 = spark.createDataFrame(data1, columns1)
    df2 = spark.createDataFrame(data2, columns2)

    # Perform a cross join
    cross_joined_df = df1.crossJoin(df2)

    # Show the result
    cross_joined_df.show()

    # Stop the Spark session
    spark.stop()

The output of above program are below.

 
    #Output :
    #+--------+------+-------+-------+
    #|    Name|salary|address|pincode|
    #+--------+------+-------+-------+
    #|Narender|  1000|  India|    100|
    #|Narender|  1000|     UK|    200|
    #|    John|  2000|  India|    100|
    #|    John|  2000|     UK|    200|
    #+--------+------+-------+-------+