If you have a situation that you can easily get the result using SQL/ SQL…
Category: spark
Spark User full article
PySpark : Using CASE WHEN for Spark SQL to conditionally execute expressions : Dataframe and SQL way explained
The WHEN clause is used in Spark SQL to conditionally execute expressions. It’s similar to a CASE statement in SQL…
Spark : Calculation of executor memory in Spark – A complete info.
The executor memory is the amount of memory allocated to each executor in a Spark cluster. It determines the amount…
PySpark : PySpark program to write DataFrame to Snowflake table.
Overview of Snowflake and PySpark. Snowflake is a cloud-based data warehousing platform that allows users to store and analyze large…
PySpark : LongType and ShortType data types in PySpark
pyspark.sql.types.LongType pyspark.sql.types.ShortType In this article, we will explore PySpark’s LongType and ShortType data types, their properties, and how to work…
PySpark : HiveContext in PySpark – A brief explanation
One of the key components of PySpark is the HiveContext, which provides a SQL-like interface to work with data stored…
PySpark: Explanation of PySpark Full Outer Join with example.
One of the most commonly used operations in PySpark is joining two dataframes together. Full outer join is one of…
PySpark : Reading from multiple files , how to get the file which contain each record in PySpark [input_file_name]
pyspark.sql.functions.input_file_name One of the most useful features of PySpark is the ability to access metadata about the input files being…
PySpark : Exploding a column of arrays or maps into multiple rows in a Spark DataFrame [posexplode_outer]
pyspark.sql.functions.posexplode_outer The posexplode_outer function in PySpark is part of the pyspark.sql.functions module and is used to explode a column of…
PySpark : Transforming a column of arrays or maps into multiple columns, with one row for each element in the array or map [posexplode]
pyspark.sql.functions.posexplode The posexplode function in PySpark is part of the pyspark.sql.functions module and is used to transform a column of…
PySpark : Calculate the percent rank of a set of values in a DataFrame column using PySpark[percent_rank]
pyspark.sql.functions.percent_rank PySpark provides a percent_rank function as part of the pyspark.sql.functions module, which is used to calculate the percent rank…