Tag: Big Data

Hive : How to drop duplicate rows from Hive table.

user July 5, 2023 0 Comments

This is a work around to show how can we drop duplicate rows from Hive table. Here is how to…

PySpark : Understanding the PySpark next_day Function

user July 4, 2023 0 Comments

Time series data often involves handling and manipulating dates. Apache Spark, through its PySpark interface, provides an arsenal of date-time…

PySpark : Extracting the Month from a Date in PySpark

user July 4, 2023 0 Comments

Working with dates Working with dates and time is a common task in data analysis. Apache Spark provides a variety…

PySpark : Calculating the Difference Between Dates with PySpark: The months_between Function

user July 4, 2023 0 Comments

When working with time series data, it is often necessary to calculate the time difference between two dates. Apache Spark…

PySpark : Retrieving Unique Elements from two arrays in PySpark

user July 4, 2023 0 Comments

Let’s start by creating a DataFrame named freshers_in. We’ll make it contain two array columns named ‘array1’ and ‘array2’, filled…

Hive : How to preserve Hive metadata [Preserve the last DDL time for the table]

user July 4, 2023 0 Comments

HOLD_DDLTIME The “last DDL time” refers to the timestamp of the most recent DDL (Data Definition Language) operation that was…

Extracting Unique Values From Array Columns in PySpark

user June 28, 2023 0 Comments

When dealing with data in Spark, you may find yourself needing to extract distinct values from array columns. This can…

PySpark : Returning an Array that Contains Matching Elements in Two Input Arrays in PySpark

user June 24, 2023 0 Comments

This article will focus on a particular use case: returning an array that contains the matching elements in two input…

PySpark : Creating Ranges in PySpark DataFrame with Custom Start, End, and Increment Values

user June 22, 2023 0 Comments

In PySpark, there isn’t a built-in function to create an array sequence given a start, end, and increment value. In PySpark,…

PySpark : How to Prepending an Element to an Array on specific condition in PySpark

user June 16, 2023 0 Comments

If you want to prepend an element to the array only when the array contains a specific word, you can…

Tag: Big Data

Hive : How to drop duplicate rows from Hive table.

PySpark : Understanding the PySpark next_day Function

PySpark : Extracting the Month from a Date in PySpark

PySpark : Calculating the Difference Between Dates with PySpark: The months_between Function

PySpark : Retrieving Unique Elements from two arrays in PySpark

Hive : How to preserve Hive metadata [Preserve the last DDL time for the table]

Extracting Unique Values From Array Columns in PySpark

PySpark : Returning an Array that Contains Matching Elements in Two Input Arrays in PySpark

PySpark : Creating Ranges in PySpark DataFrame with Custom Start, End, and Increment Values

PySpark : How to Prepending an Element to an Array on specific condition in PySpark

Trending

Recent Posts

Featured Posts – Slider Widget

How PARTITION BY Works in Snowflake, and SQL in general

Stash a specific file using Git

Prevent your computer from locking : Python to simulate mouse movements

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Most Viewed Posts