In PySpark how to sort data in descending order, while putting the rows with null values at the end of the result ?

PySpark @ Freshers.in

pyspark.sql.Column.desc_nulls_last

In PySpark, the desc_nulls_last function is used to sort data in descending order, while putting the rows with null values at the end of the result set. This function is often used in conjunction with the sort function in PySpark to sort data in descending order while keeping null values at the end.

Here’s an example of how you might use desc_nulls_last in PySpark:

from pyspark.sql import SparkSession
from pyspark.sql.functions import desc_nulls_last
# Create a DataFrame
df = spark.createDataFrame([
(1, "Sam John", None), 
(2, None, 3.0), 
(3, "Peter Walter", 1.0),
(4,"James Jack", 3.0),
(5,"Benjamin Brooke", None),], ["allocation_id", "full_name", "value"])
# Sort the DataFrame by the "value" column in descending order, putting null values first
sorted_df = df.sort(desc_nulls_last("value"))
sorted_df.show()

This will output:

+-------------+---------------+-----+
|allocation_id|      full_name|value|
+-------------+---------------+-----+
|            4|     James Jack|  3.0|
|            2|           null|  3.0|
|            3|   Peter Walter|  1.0|
|            1|       Sam John| null|
|            5|Benjamin Brooke| null|
+-------------+---------------+-----+

As you can see the output is sorted in descending order and the null values are put last.

It is also worth noting that this function can be used to sort on multiple columns as well, in that case the sorting will be done based on the order of the columns you pass.

sorted_df = df.sort(desc_nulls_last("value"),desc_nulls_last("full_name"))
sorted_df.show()
Author: user

Leave a Reply