Tag: Apache
PySpark-How to create and RDD from a List and from AWS S3
In this article you will learn , what an RDD is ? How can we create an RDD from a…
How to run dataframe as Spark SQL – PySpark
If you have a situation that you can easily get the result using SQL/ SQL already existing , then you…
How can I get all the hive tables and its external location,partitions etc ?
user June 5, 2021 0 Comments on How can I get all the hive tables and its external location,partitions etc ?
There may be some situations where you may need to give all the hive tables created and its location and…
Hive – Where can I get the hive metastore details (Credentials,Host,Server etc)
user June 5, 2021 0 Comments on Hive – Where can I get the hive metastore details (Credentials,Host,Server etc)
In General we don’t deal directly with Hive metastore . But there are some situation that we may need to…
Hive – What are the metastore tables in Hive ?
Metastore is the central repository of Apache Hive metadata. It stores metadata for Hive tables AUX_TABLE BUCKETING_COLS CDS COLUMNS_V2 COMPACTION_QUEUE…
How to remove csv header using Spark (PySpark)
A common use case when dealing with CSV file is to remove the header from the source to do data…
Apache PIG interview questions
1. What is pig? Pig is a Apache open soucre project which run on top of hadoop,provides engine for data…