PySpark : Reversing the order of lists in a dataframe column using PySpark

user July 5, 2023 Leave a Comment

pyspark.sql.functions.reverse

Collection function: returns a reversed string or an array with reverse order of elements.

In order to reverse the order of lists in a dataframe column, we can use the PySpark function reverse() from pyspark.sql.functions. Here’s an example.

Let’s start by creating a sample dataframe with a list of strings.

from pyspark.sql import SparkSession
from pyspark.sql.functions import reverse
spark = SparkSession.builder.getOrCreate()
#Create a sample data
data = [("Sachin", ["Python", "C", "Go"]),
        ("Renjith", ["RedShift", "Snowflake", "Oracle"]),
        ("Ahamed", ["Android", "MacOS", "Windows"])]
#Create DataFrame
df = spark.createDataFrame(data, ["Name", "Techstack"])
df.show()

Output

+-------+--------------------+
|   Name|           Techstack|
+-------+--------------------+
| Sachin|     [Python, C, Go]|
|Renjith|[RedShift, Snowfl...|
| Ahamed|[Android, MacOS, ...|
+-------+--------------------+

Now, we can apply the reverse() function to the “Techstack” column to reverse the order of the list.

df_reversed = df.withColumn("Fruits", reverse(df["Techstack"]))
df_reversed.show()

Output

+-------+--------------------+
|   Name|           Techstack|
+-------+--------------------+
| Sachin|     [Go, C, Python]|
|Renjith|[Oracle, Snowflak...|
| Ahamed|[Windows, MacOS, ...|
+-------+--------------------+

As you can see, the order of the elements in each list in the “Techstack” column has been reversed. The withColumn() function is used to add a new column or replace an existing column (with the same name) in the dataframe. Here, we are replacing the “Fruits” column with a new column where the lists have been reversed.

Spark important urls to refer

Post Views: 108

Author: user

PySpark : Reversing the order of lists in a dataframe column using PySpark

pyspark.sql.functions.reverse

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

How PARTITION BY Works in Snowflake, and SQL in general

Stash a specific file using Git

Prevent your computer from locking : Python to simulate mouse movements

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Most Viewed Posts

pyspark.sql.functions.reverse

Related Articles

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget