Author: user

PySpark @ Freshers.in

Optimizing PySpark queries with adaptive query execution – (AQE) – Example included

user September 29, 2023

Spark 3+ brought numerous enhancements and features, and one of the notable ones is Adaptive Query Execution (AQE). AQE is…

Continue Reading

Transferring elastic IP between AWS accounts – Step by step process

user September 29, 2023

An AWS Elastic IP (EIP) is a steadfast public IPv4 address that users can allocate to AWS resources like EC2…

Continue Reading

python @ Freshers.in

Handling NULL values in dynamic SQL insert statements using Python

user September 29, 2023

In this we are dynamically creating and executing SQL insert statements to add rows from a DataFrame to a Snowflake…

Continue Reading

PySpark @ Freshers.in

PySpark : Calculate the Euclidean distance or the square root of the sum of the squares of its arguments using PySpark.

user September 27, 2023 0 Comments

In PySpark, the hypot function is a mathematical function used to calculate the Euclidean distance or the square root of…

Continue Reading

PySpark @ Freshers.in

PySpark : How to perform compute covariance using covar_pop and covar_samp with PySpark

user September 27, 2023 0 Comments

Covariance is a statistical measure that indicates the extent to which two variables change together. If the variables increase and…

Continue Reading

Automated email responses using Gmail and google sheets with Google apps script

user September 27, 2023 0 Comments

Automated email responses can be set up using Google Scripts, a scripting platform developed by Google for light-weight application development…

Continue Reading

AWS Glue @ Freshers.in

Navigating job dependencies in AWS glue – Managing ETL workflows

user September 27, 2023 0 Comments

AWS Glue manages dependencies between jobs using triggers. Triggers can start jobs based on the completion status of other jobs,…

Continue Reading

Airflow scheduler does not appear to be running. Last heartbeat was received 20 minutes ago. The DAGs list may not update : Resolved

user September 27, 2023 0 Comments

You may get an error in Airflow as “The scheduler does not appear to be running. Last heartbeat was received…

Continue Reading

PySpark @ Freshers.in

Spark repartition() vs coalesce() – A complete information

user September 27, 2023 0 Comments

In PySpark, managing data across different partitions is crucial for optimizing performance, especially for large-scale data processing tasks. Two methods…

Continue Reading

PySpark @ Freshers.in

Grouping and aggregating multi-column data with PySpark – Complete example included

user September 27, 2023 0 Comments

The groupBy function is widely used in PySpark SQL to group the DataFrame based on one or multiple columns, apply…

Continue Reading

Copyright © 2025 Freshers.in