Column-wise comparisons in PySpark using the greatest function: Getting the maximum value with PySpark’s greatest function

PySpark @ Freshers.in

pyspark.sql.functions.greatest

In the vast universe of PySpark’s functionalities, there exists a function that often becomes the unsung hero when dealing with comparison operations: the pyspark.sql.functions.greatest. As its name suggests, this function evaluates a list of column names and seamlessly returns the greatest value.

While Python offers numerous ways to find the maximum value from a list, greatest is tailor-made for PySpark DataFrames. It allows direct column-wise comparison, ensuring optimized and distributed computations in big data scenarios. PySpark’s pyspark.sql.functions.greatest isn’t just a function; it’s a testament to PySpark’s capability to handle and streamline large-scale data operations.

Before diving in, ensure you’ve installed PySpark and its required dependencies. With that set, let’s immerse ourselves in a hands-on exercise using hardcoded data:

PySpark DataFrame operations and Column-wise Max in PySpark

from pyspark.sql import SparkSession
from pyspark.sql.functions import greatest
# Initialize Spark session
spark = SparkSession.builder.appName("greatest_demo @ Freshers.in").getOrCreate()
# Create a DataFrame with hardcoded data
data = [("Sachin", 85, 90, 88), ("Sangeeth", 92, 87, 93), ("Rakesh", 88, 89, 91)]
df = spark.createDataFrame(data, ["Name", "Math", "Physics", "Chemistry"])
# Determine the highest marks for each student
df_with_greatest = df.withColumn("Highest_Mark", greatest("Math", "Physics", "Chemistry"))
# Display the results
df_with_greatest.show()

When executed, this script unveils a DataFrame showcasing each student’s name, their marks, and their highest score among the three subjects.

+--------+----+-------+---------+------------+
|    Name|Math|Physics|Chemistry|Highest_Mark|
+--------+----+-------+---------+------------+
|  Sachin|  85|     90|       88|          90|
|Sangeeth|  92|     87|       93|          93|
|  Rakesh|  88|     89|       91|          91|
+--------+----+-------+---------+------------+
Author: user