Metastore is the central repository of Apache Hive metadata. It stores metadata for Hive tables…
Author: user
How to copy data from Redshift to Hive ? How to unload from Redshift in Parquet format ? Explained with Sample code .
There may be some business scenario to copy daily data from Redshift to Hive. For the compression , it is…
How to do Force serialization in AWS Redshift table by locking all tables?
You can force serialization by locking all tables in each session. The LOCK command blocks operations that would result in…
Serializable isolation violation on table in Redshift : ERROR: 1023 : Resolved solution
When you run concurrent Amazon Redshift operations in different sessions, there is a probability of getting “ERROR: 1023 DETAIL: Serializable…
How to concatenate multiple columns in a Spark dataframe
concat_ws : With concat_ws () function you can concatenates multiple input string columns together into a single string column, using…
Elastic Network Interfaces quick reference and cheat sheet
Elastic Network Interfaces – ENIs are virtual network cards you can attach to your EC2 instances. They are used to…
AWS EC2 quick reference and cheat sheet
Amazon EC2 is the virtual server instances on the AWS cloud. Amazon EC2 provides varying combinations of CPU, memory, storage,…
PySpark-How to create and RDD from a List and from AWS S3
In this article you will learn , what an RDD is ? How can we create an RDD from a…
WaterLight – A portable lantern that can be charged with salt water (used for charging Mobile)
E-Dina, a Colombian renewable energy start-up has developed a cordless light which can converts salt water into electricity as a…
How to run dataframe as Spark SQL – PySpark
If you have a situation that you can easily get the result using SQL/ SQL already existing , then you…
How to get all combination of columns using PySpark? What is Cube in Spark ?
A cube is a multi-dimensional generalization of a two- or three-dimensional spreadsheet. Cube is a shorthand for multidimensional dataset, given…