pyspark.sql.Column.desc_nulls_last
In PySpark, the desc_nulls_last
function is used to sort data in descending order, while putting the rows with null values at the end of the result set. This function is often used in conjunction with the sort
function in PySpark to sort data in descending order while keeping null values at the end.
Here’s an example of how you might use desc_nulls_last
in PySpark:
from pyspark.sql import SparkSession
from pyspark.sql.functions import desc_nulls_last
# Create a DataFrame
df = spark.createDataFrame([
(1, "Sam John", None),
(2, None, 3.0),
(3, "Peter Walter", 1.0),
(4,"James Jack", 3.0),
(5,"Benjamin Brooke", None),], ["allocation_id", "full_name", "value"])
# Sort the DataFrame by the "value" column in descending order, putting null values first
sorted_df = df.sort(desc_nulls_last("value"))
sorted_df.show()
This will output:
+-------------+---------------+-----+
|allocation_id| full_name|value|
+-------------+---------------+-----+
| 4| James Jack| 3.0|
| 2| null| 3.0|
| 3| Peter Walter| 1.0|
| 1| Sam John| null|
| 5|Benjamin Brooke| null|
+-------------+---------------+-----+
As you can see the output is sorted in descending order and the null values are put last.
It is also worth noting that this function can be used to sort on multiple columns as well, in that case the sorting will be done based on the order of the columns you pass.
sorted_df = df.sort(desc_nulls_last("value"),desc_nulls_last("full_name"))
sorted_df.show()
Spark important urls