How to drop multiple partition in Hive by giving condition.

Hive @ Freshers.in

Hive Partitions is a good and easy way to organizes Hive tables into partitions by dividing tables into different parts based on partition keys (YEAR/MONTH/DAY) or (COUNTRY/STATE/ZIP) . Partition is helpful when the table has one or more Partition keys. There are some situation when you want to delete multiple partitions (hundreds) from a Hive table. This can be achieved using DROP command as follows

ALTER TABLE <TABLE NAME> DROP PARTITION (column_name < 'column_value') , PARTITION (column_name >  'column_value');

Example 

ALTER TABLE HRM.FRESHERS_IN_LOGDATA  DROP PARTITION (as_on_year< '2018') , PARTITION (as_on_year > '2020');

Note : If you are having the external table, then the table metadata only will get updated and data still exists in the storage.  When querying you will get the expected result. But if some one do an MSCK REPAIR on the table , the deleted partition will be back on Metadata and you will be able to see the deleted value as well. So if you want to delete permanently you may need to delete the data from the storage as well if you are dealing with external table.

An easy way to make sure that on dropping the data permanently using alter table, then you need to convert the external table to managed table and once the Alter is done, convert it back to external table.

How to delete a partition data as well from Hive external table on DROP command
How to convert a hive managed table to external table without recreating it ?
Hive Drop commands( External URL )

Author: user

Leave a Reply