In PySpark we can replace a value in one column or multiple column or multiple…
Tag: Big Data
What is Swappiness Value? What is the role of Swappiness Value during the cluster set up?
vm.swappiness is one of the Kernel Parameter in Linux or UNIX, vm.swappiness value is from 0-100 which controls the swapping…
What is Snowflake Merge Command ? How to use it ?
The Snowflake Merge command will allows you to perform merge operations between two tables. The Merge operation includes Insert, Delete,…
What are the Data Processing Operators in Snowflake ?
Filter : Represents an operation that filters the records. Attributes: Filter condition – the condition used to perform filtering. Join…
What are the Query Operators supported by Snowflake
Snowflake supports most of the standard operators defined in SQL:1999. Arithmetic Operators + , – , * , / ,…
PySpark how to get rows having nulls for a column or columns without nulls or count of Non null
pyspark.sql.Column.isNotNull isNotNull() : True if the current expression is NOT null. isNull() : True if the current expression is null. With…
PySpark – groupby with aggregation (count, sum, mean, min, max)
pyspark.sql.DataFrame.groupBy PySpark groupby functions groups the DataFrame using the specified columns to run aggregation ( count,sum,mean, min, max) on them….
PySpark filter : How to filter data in Pyspark – Multiple options explained.
pyspark.sql.DataFrame.filter PySpark filter function is used to filter the data in a Spark Data Frame, in short used to cleansing…
Amazon Aurora quick reference and cheat sheet.
1. Aurora is an AWS proprietary database. 2. Aurora is a fully managed service. 3. Aurora have High performance and…
Amazon Athena quick reference and cheat sheet
1. Amazon Athena is an interactive query service to analyze data in Amazon S3 using standard SQL. 2. Athena is…
How to drop multiple partition in Hive by giving condition.
Hive Partitions is a good and easy way to organizes Hive tables into partitions by dividing tables into different parts…