Adding a new column to a DataFrame with a constant value

PySpark @ Freshers.in

The lit function in PySpark is a straightforward yet powerful tool for adding constant values as new columns in a DataFrame. Its simplicity and versatility make it invaluable for a wide range of data manipulation tasks.This article aims to shed light on the lit function in PySpark, exploring its advantages and practical applications.

Understanding lit in PySpark

The lit function in PySpark is used to add a new column to a DataFrame with a constant value. This function is particularly useful when you need to append a fixed value across all rows of a DataFrame. The syntax for the lit function is straightforward:

from pyspark.sql.functions import lit

Advantages of using lit

  • Flexibility: Allows adding constants or expressions as new columns.
  • Simplicity: Easy to use for creating new columns with fixed values.
  • Data Enrichment: Useful for appending static data to dynamic datasets.

Use case: Adding a constant identifier to a name list

Let’s consider a scenario where we have a dataset containing names: Sachin, Ram, Raju, David, and Wilson. Suppose we want to add a new column that identifies each name as belonging to a particular group.

Dataset

Name
Sachin
Ram
Raju
David
Wilson

Objective

Add a new column, Group, with a constant value ‘GroupA’ for all rows.

Implementation in PySpark

Setting up the PySpark environment and creating the DataFrame:

from pyspark.sql import SparkSession
from pyspark.sql.functions import lit
# Initialize Spark Session
spark = SparkSession.builder.appName("Lit Example").getOrCreate()
# Sample Data
data = [("Sachin",), ("Ram",), ("Raju",), ("David",), ("Wilson",)]
# Creating DataFrame
df = spark.createDataFrame(data, ["Name"])
df.show()

Applying the lit function:

Output

The DataFrame now includes a new column, Group, with the constant value ‘GroupA’:

Name Group
Sachin GroupA
Ram GroupA
Raju GroupA
David GroupA
Wilson GroupA

Spark important urls to refer

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions
  5. Official Page
Author: user