pyspark.sql.functions.array_union
The array_union function is a PySpark function that allows you to combine the elements of two or more arrays in a DataFrame column. The function takes one or more column names as its arguments, and returns a new DataFrame with a new column that contains the union of the elements of the input columns.
Syntax
Here’s an example of how to use the array_union function:
Result
You can also pass multiple column names to the function, in order to combine the elements of those columns.
As you can see, the new column contains the union of the elements of the input columns. It removes duplicate elements. This function is particularly useful when you want to combine the elements of multiple arrays columns and make it easier to query and analyze the data in the DataFrame.
Spark important urls to refer