Author: user

python @

How to merge multiple PDF files using Python?

Use case : If you have multiple files for example chapter wise question papers etc. and you need to have…

PySpark @

How to create UDF in PySpark ? What are the different ways you can call PySpark UDF ( With example)

PySpark UDF PySpark UDF is used to extend the PySpark build in capabilities. UDF (User Defined Functions) are used to…

PySpark @

How to convert MapType to multiple columns based on Key using PySpark ?

Use case : Converting Map to multiple columns. There can be raw data with Maptype with multiple key value pair….

Apache Airflow

How to create a Airflow DAG(Scheduler) to execute a redshift query ?

Use case : We have a redshift query (an insert sql ) to load data from another table on daily…

Hive @

How to insert from Non Partitioned table to Partitioned table in Hive?

You can insert data from Non Partitioned table to Partitioned table , in short , if you want to have…

AWS Glue @

How to create AWS Glue table where partitions have different columns?

There can be a condition where you can expect new column in JSON file regularly . There can be a…

Explain what is happening internally once you upload a file in Amazon S3

This article will explain what is happening inside the S3 once you upload a file.  The client sends an HTTP…