pyspark.sql.functions.reverse
Collection function: returns a reversed string or an array with reverse order of elements.
In order to reverse the order of lists in a dataframe column, we can use the PySpark function reverse() from pyspark.sql.functions. Here’s an example.
Let’s start by creating a sample dataframe with a list of strings.
from pyspark.sql import SparkSession
from pyspark.sql.functions import reverse
spark = SparkSession.builder.getOrCreate()
#Create a sample data
data = [("Sachin", ["Python", "C", "Go"]),
("Renjith", ["RedShift", "Snowflake", "Oracle"]),
("Ahamed", ["Android", "MacOS", "Windows"])]
#Create DataFrame
df = spark.createDataFrame(data, ["Name", "Techstack"])
df.show()
Output
+-------+--------------------+
| Name| Techstack|
+-------+--------------------+
| Sachin| [Python, C, Go]|
|Renjith|[RedShift, Snowfl...|
| Ahamed|[Android, MacOS, ...|
+-------+--------------------+
Now, we can apply the reverse() function to the “Techstack” column to reverse the order of the list.
df_reversed = df.withColumn("Fruits", reverse(df["Techstack"]))
df_reversed.show()
Output
+-------+--------------------+
| Name| Techstack|
+-------+--------------------+
| Sachin| [Go, C, Python]|
|Renjith|[Oracle, Snowflak...|
| Ahamed|[Windows, MacOS, ...|
+-------+--------------------+
As you can see, the order of the elements in each list in the “Techstack” column has been reversed. The withColumn() function is used to add a new column or replace an existing column (with the same name) in the dataframe. Here, we are replacing the “Fruits” column with a new column where the lists have been reversed.
Spark important urls to refer