Category: spark

Spark User full article

PySpark @ Freshers.in

PySpark : Dropping duplicate rows in Pyspark – A Comprehensive Guide with example

PySpark provides several methods to remove duplicate rows from a dataframe. In this article, we will go over the steps…

PySpark @ Freshers.in

PySpark : Replacing null column in a PySpark dataframe to 0 or any value you wish.

To replace null values in a PySpark DataFrame column that contain null with a numeric value (e.g., 0), you can…

PySpark @ Freshers.in

PySpark : unix_timestamp function – A comprehensive guide

One of the key functionalities of PySpark is the ability to transform data into the desired format. In some cases,…

PySpark @ Freshers.in

PySpark : Reading parquet file stored on Amazon S3 using PySpark

To read a Parquet file stored on Amazon S3 using PySpark, you can use the following code: from pyspark.sql import…

PySpark @ Freshers.in

PySpark : Setting PySpark parameters – A complete Walkthru [3 Ways]

In PySpark, you can set various parameters to configure your Spark application. These parameters can be set in different ways…

PySpark @ Freshers.in

Spark : Calculation of executor memory in Spark – A complete info.

The executor memory is the amount of memory allocated to each executor in a Spark cluster. It determines the amount…

PySpark @ Freshers.in

PySpark : PySpark program to write DataFrame to Snowflake table.

Overview of Snowflake and PySpark. Snowflake is a cloud-based data warehousing platform that allows users to store and analyze large…

PySpark @ Freshers.in

PySpark : LongType and ShortType data types in PySpark

pyspark.sql.types.LongType pyspark.sql.types.ShortType In this article, we will explore PySpark’s LongType and ShortType data types, their properties, and how to work…

PySpark @ Freshers.in

PySpark : HiveContext in PySpark – A brief explanation

One of the key components of PySpark is the HiveContext, which provides a SQL-like interface to work with data stored…