Tag: PySpark

PySpark @ Freshers.in

Concatenating two or more maps into a single map : map_concat

The map_concat function in PySpark is designed to concatenate two or more maps into a single map. It merges key-value…

Continue Reading Concatenating two or more maps into a single map : map_concat
PySpark @ Freshers.in

Removing leading spaces (spaces on the left side) from a string in PySpark

PySpark, a leading tool in big data processing, provides several functions for string manipulation, one of which is ltrim. This…

Continue Reading Removing leading spaces (spaces on the left side) from a string in PySpark
PySpark @ Freshers.in

Adding a new column to a DataFrame with a constant value

The lit function in PySpark is a straightforward yet powerful tool for adding constant values as new columns in a…

Continue Reading Adding a new column to a DataFrame with a constant value
PySpark @ Freshers.in

Finding the position of a substring within a string using PySpark

pyspark.sql.functions.locate PySpark, a tool for handling large-scale data processing, offers a plethora of functions for string manipulation, one of which…

Continue Reading Finding the position of a substring within a string using PySpark
PySpark @ Freshers.in

PySpark : Reference a column in a DataFrame – col

In the world of PySpark, efficient data manipulation and transformation are key to handling big data. The col function plays…

Continue Reading PySpark : Reference a column in a DataFrame – col
PySpark @ Freshers.in

Perform ascending sorting of data while placing null values at the end in PySpark

In the realm of big data processing with PySpark, handling null values efficiently during sorting operations is crucial. The asc_nulls_last…

Continue Reading Perform ascending sorting of data while placing null values at the end in PySpark
PySpark @ Freshers.in

Mastering the Pivot function in PySpark : Rotate data from a long format to a wide format

Understanding pivot in PySpark This article aims to elucidate the concept of pivot, its advantages, and its practical application through…

Continue Reading Mastering the Pivot function in PySpark : Rotate data from a long format to a wide format
PySpark @ Freshers.in

PySpark sorts data within each partition independently : Efficient sorting

In the realm of big data processing with PySpark, managing data efficiently is crucial. sortWithinPartitions emerges as a key method…

Continue Reading PySpark sorts data within each partition independently : Efficient sorting
PySpark @ Freshers.in

How to perform SQL-like column transformations in PySpark : selectExpr

selectExpr, a method that simplifies and enhances data transformation. This article aims to demystify selectExpr, highlighting its advantages and demonstrating…

Continue Reading How to perform SQL-like column transformations in PySpark : selectExpr