In PySpark how sort data in descending order, while putting the rows with null values at the beginning ?

PySpark @ Freshers.in

pyspark.sql.Column.desc_nulls_first

In PySpark, the desc_nulls_first function is used to sort data in descending order, while putting the rows with null values at the beginning of the result set. This function is often used in conjunction with the sort function in PySpark to sort data in descending order while keeping null values at the top.

Here’s an example of how you might use desc_nulls_first in PySpark:

Sample Code:

from pyspark.sql import SparkSession
from pyspark.sql.functions import desc_nulls_first
# Create a DataFrame
df = spark.createDataFrame([
(1, "Sam John", None), 
(2, None, 3.0), 
(3, "Peter Walter", 1.0),
(4,"James Jack", 3.0),
(5,"Benjamin Brooke", None),], ["allocation_id", "full_name", "value"])
# Sort the DataFrame by the "value" column in descending order, putting null values first
sorted_df = df.sort(desc_nulls_first("value"))
sorted_df.show()

Sorted output:

+-------------+---------------+-----+
|allocation_id|      full_name|value|
+-------------+---------------+-----+
|            5|Benjamin Brooke| null|
|            1|       Sam John| null|
|            4|     James Jack|  3.0|
|            2|           null|  3.0|
|            3|   Peter Walter|  1.0|
+-------------+---------------+-----+

As you can see the output is sorted in descending order and the null values are put first.

Author: user

Leave a Reply