Month: August 2021

PySpark @

PySpark how to get rows having nulls for a column or columns without nulls or count of Non null

pyspark.sql.Column.isNotNull isNotNull() : True if the current expression is NOT null. isNull() : True if the current expression is null. With…

PySpark @

PySpark – groupby with aggregation (count, sum, mean, min, max)

pyspark.sql.DataFrame.groupBy PySpark groupby functions groups the DataFrame using the specified columns to run aggregation ( count,sum,mean, min, max) on them….

PySpark @

PySpark filter : How to filter data in Pyspark – Multiple options explained.

pyspark.sql.DataFrame.filter PySpark filter function is used to filter the data in a Spark Data Frame, in short used to cleansing…

Amazon CloudFront @

Amazon CloudFront quick reference and cheat sheet

1. CloudFront gives developers an easy and cost-effective way to distribute content with low latency and high data transfer speeds….

Amazon Aurora @

Amazon Aurora quick reference and cheat sheet.

1. Aurora is an AWS proprietary database. 2. Aurora is a fully managed service. 3. Aurora have High performance and…

AWS Athena @

Amazon Athena quick reference and cheat sheet

1. Amazon Athena is an interactive query service to analyze data in Amazon S3 using standard SQL. 2. Athena is…

python @

Python throwing as NameError: name ‘__file__’ is not defined – Solution

On Executing  os.path.dirname(os.path.realpath(__file__)) in python interactive shell, you will get the error NameError: name ‘__file__’ is not defined. This is…

amazon_api_gateway @

Amazon API Gateway quick reference and cheat sheet

1. Amazon API Gateway is an AWS service for creating, publishing, maintaining, monitoring, and securing REST, HTTP, and WebSocket APIs…

Hive @

How to drop multiple partition in Hive by giving condition.

Hive Partitions is a good and easy way to organizes Hive tables into partitions by dividing tables into different parts…

Hive @

How to delete a partition data as well from Hive external table on DROP command?

As you know external tables are tables where  Hive does not manage the data of the External table. So when…