In Hive, Managed tables / Internal table are Hive owned tables and the tables data…
Tag: Big Data
How to delete a partition data as well from Hive external table on DROP command?
As you know external tables are tables where Hive does not manage the data of the External table. So when…
How to convert a hive managed table to external table without recreating it ?
In Hive, Managed tables / Internal table are Hive owned tables and the tables data are managed and controlled by…
How to do Force serialization in AWS Redshift table by locking all tables?
You can force serialization by locking all tables in each session. The LOCK command blocks operations that would result in…
PySpark-How to create and RDD from a List and from AWS S3
In this article you will learn , what an RDD is ? How can we create an RDD from a…
How to run dataframe as Spark SQL – PySpark
If you have a situation that you can easily get the result using SQL/ SQL already existing , then you…
How to get all combination of columns using PySpark? What is Cube in Spark ?
A cube is a multi-dimensional generalization of a two- or three-dimensional spreadsheet. Cube is a shorthand for multidimensional dataset, given…
How can I get all the hive tables and its external location,partitions etc ?
There may be some situations where you may need to give all the hive tables created and its location and…
Hive – Where can I get the hive metastore details (Credentials,Host,Server etc)
In General we don’t deal directly with Hive metastore . But there are some situation that we may need to…
Hive – What are the metastore tables in Hive ?
Metastore is the central repository of Apache Hive metadata. It stores metadata for Hive tables AUX_TABLE BUCKETING_COLS CDS COLUMNS_V2 COMPACTION_QUEUE…
How to remove csv header using Spark (PySpark)
A common use case when dealing with CSV file is to remove the header from the source to do data…