Tag: Apache

PySpark @ Freshers.in

PySpark-How to create and RDD from a List and from AWS S3

In this article you will learn , what an RDD is ?  How can we create an RDD from a…

PySpark @ Freshers.in

How to run dataframe as Spark SQL – PySpark

If you have a situation that you can easily get the result using SQL/ SQL already existing , then you…

How can I get all the hive tables and its external location,partitions etc ?

There may be some situations where you may need to give all the hive tables created and its location and…

Hive – Where can I get the hive metastore details (Credentials,Host,Server etc)

In General we don’t deal directly with Hive metastore . But there are some situation that we may need to…

Hive – What are the metastore tables in Hive ?

Metastore is the central repository of Apache Hive metadata. It stores metadata for Hive tables AUX_TABLE BUCKETING_COLS CDS COLUMNS_V2 COMPACTION_QUEUE…

How to remove csv header using Spark (PySpark)

A common use case when dealing with CSV file is to remove the header from the source to do data…

Apache PIG interview questions

1. What is pig? Pig is a Apache open soucre project which run on top of hadoop,provides engine for data…