Dynamic partition in hive
Dynamic partitioning is a tactical method for loading data from a non-partitioned table. Where there are large number of partition values , then its is hard to mention each partition manually. Dynamic partition columns are used mostly on columns whose values are only known at EXECUTION TIME. Dynamic Partition columns are specified the same way as it is for static partition columns, in the partition-clause. The main difference is that dynamic partition columns do not have values while static partition columns have. In the partition-clause we need to specify all partitioning columns, even though all of them are dynamic partition columns. All dynamic partition columns, only allowed in nonstrict mode. In strict mode, it should throw an error. The dynamic partitioning column’s type should be derived from the expression. The data type has to be able to be converted to a string in order to be saved as a directory name in HDFS. After converting column value to string, we still need to convert the string value to a valid directory name. Some reasons are (a) String length is unlimited in theory, but HDFS/local FS directory name length is limited. (b) String value could contains special characters that is reserved in FS path names (such as ‘/’ or ‘..’ etc ). If there are multiple partitioning columns, their order is significant since that translates to the directory structure in HDFS / S3 etc.
Sample Data
144,Tim John,11,M,max,UK,262 145,Elon Musk,77,F,mid,USA,273 146,Jerry Jhones,17,M,max,IND,274 147,Nick Cater,52,M,mid,USA,275 148,Briam Rodgers ,59,F,mid,USA,276
Create External Table
create external table clients_pii_tbl (
id1 int,
name string,
age int,
sex string,
rge string,
country string,
id2 int)
row format delimited
fields terminated by ','
location 's3://freshers-in-data/hive-data/dynamicsample' ;
Create a partition table
create external table clients_pii_dynamic_tbl (
id1 int,
name string,
age int,
sex string,
rge string,
id2 int)
partitioned by (country string)
row format delimited
fields terminated by ',';
In the above partitioning columns should be mentioned as partitioned by , and should not used as fields name . If used will get an error as FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns.
Hive Setting
The below need to be enabled. If using hive console, then paste this in hive terminal.
set hive.exec.dynamic.partition = true;
set hive.exec.dynamic.partition.mode = nostrict;
set hive.exec.dynamic.partition.pernode = 3;
set hive.exec.max.created.files = 100;
Execute the insert script
insert into table clients_pii_dynamic_tbl
partition(country)
select id1 ,name ,age ,sex ,rge ,id2,country
from clients_pii_tbl;
Loading Logs
Result of the above query
Loading data to table sac01.t3p_dp1 partition (country=null)
Time taken for load dynamic partitions : 1318
Loading partition {country=UK}
Loading partition {country=CHN}
Loading partition {country=USA}
Loading partition {country=IND}
Result
show partitions t3p_dp1;
OK
country=IND
country=UK
country=USA
Time taken: 0.108 seconds, Fetched: 4 row(s)
Reference