Spark unionByName method explained

Returns a new DataFrame containing union of rows in this and another DataFrame. by 
 as per spark documentation 
https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.unionByName.html

 ## Combine datasets


 We have various datasets with different metrics about contracts. In each dataset there is a unique key identifying the contract.

  **Premiums**

 | Contract Key | Premium     |
 | ------------ | ----------- |
 | A            | $100,000    |
 | B            | $10,000     |

 **Losses**

 | Contract Key | Loss        |
 | ------------ | ----------- |
 | A            | $70,000     |
 | B            | $50,000     |

 **Costs**

 | Contract Key | Cost   |
 | ------------ | ------ |
 | A            | $5,000 |
 | B            | $1,000 |


 We need to create one flat table based on all these tables.


 **Result**
 | Contract Key | Premium    | Loss    | Cost   |
 | ------------ | ---------- | ------- | ------ |
 | A            | $100,000   | $70,000 | $5,000 |
 | B            | $10,000    | $50,000 | $1,000 |

Code :


pyspark.sql.DataFrame.unionByName

from pyspark.sql import SparkSession
from pyspark.sql import Row

# Create a Spark session
spark = SparkSession.builder \
    .appName("UnionByNameExample") \
    .getOrCreate()


Premiums = spark.createDataFrame([['A', 100000],['B', 10000]], ["Key", "Premium"])
Losses = spark.createDataFrame([['A', 700000],['B', 500000]], ["Key", "Losses"])
Costs = spark.createDataFrame([['A', 4000],['B', 30000]], ["Key", "Costs"])


# Perform union by name
uniondf= Premiums.unionByName(Losses, allowMissingColumns=True).unionByName(Costs, allowMissingColumns=True)

# Show the result
uniondf.show()

# Stop the Spark session
spark.stop()


Output :


+---+-------+------+-----+
|Key|Premium|Losses|Costs|
+---+-------+------+-----+
|  A| 100000|  null| null|
|  B|  10000|  null| null|
|  A|   null|700000| null|
|  B|   null|500000| null|
|  A|   null|  null| 4000|
|  B|   null|  null|30000|
+---+-------+------+-----+