Category: spark

Spark User full article

PySpark @ Freshers.in

PySpark : Generating a 64-bit hash value in PySpark

Introduction to 64-bit Hashing A hash function is a function that can be used to map data of arbitrary size…

Continue Reading PySpark : Generating a 64-bit hash value in PySpark
PySpark @ Freshers.in

PySpark : Introduction to BASE64_ENCODE and its Applications in PySpark

Introduction to BASE64_ENCODE and its Applications in PySpark BASE64 is a group of similar binary-to-text encoding schemes that represent binary…

Continue Reading PySpark : Introduction to BASE64_ENCODE and its Applications in PySpark
PySpark @ Freshers.in

PySpark : Understanding the PySpark next_day Function

Time series data often involves handling and manipulating dates. Apache Spark, through its PySpark interface, provides an arsenal of date-time…

Continue Reading PySpark : Understanding the PySpark next_day Function
PySpark @ Freshers.in

PySpark : Extracting the Month from a Date in PySpark

Working with dates Working with dates and time is a common task in data analysis. Apache Spark provides a variety…

Continue Reading PySpark : Extracting the Month from a Date in PySpark
PySpark @ Freshers.in

PySpark : Retrieving Unique Elements from two arrays in PySpark

Let’s start by creating a DataFrame named freshers_in. We’ll make it contain two array columns named ‘array1’ and ‘array2’, filled…

Continue Reading PySpark : Retrieving Unique Elements from two arrays in PySpark
PySpark @ Freshers.in

Extracting Unique Values From Array Columns in PySpark

When dealing with data in Spark, you may find yourself needing to extract distinct values from array columns. This can…

Continue Reading Extracting Unique Values From Array Columns in PySpark