pyspark.sql.functions.array_remove
Syntax
pyspark.sql.functions.array_remove(col, element)
pyspark.sql.functions.array_remove is a function that removes all occurrences of a specified element from an array column in a DataFrame. This is a collection function remove all elements that equal to element from the given array. For example, if you have a DataFrame with a column named “colors” that contains arrays of strings, you can use array_remove to remove the string “red” from all arrays in that column:
from pyspark.sql.functions import array_remove
df = spark.createDataFrame([(1, ["red", "blue", "green"]), (2, ["yellow", "red", "purple"])], ["id", "colors"])
df.show(20,False)
+---+---------------------+
|id |colors |
+---+---------------------+
|1 |[red, blue, green] |
|2 |[yellow, red, purple]|
+---+---------------------+
No we need to remove “red” from the column “colors”
df.select("id", array_remove("colors", "red").alias("new_colors")).show()
Result
+---+----------------+
| id| new_colors|
+---+----------------+
| 1| [blue, green]|
| 2|[yellow, purple]|
+---+----------------+
Spark important urls to refer