Hive's dynamic partitioning is a feature that enables the automatic partitioning of data in Hive…
Tag: Big Data
Explain how can you implement dynamic partitioning in Hive (automatically creating partition based on column value)
Dynamic partition in hive Dynamic partitioning is a tactical method for loading data from a…
How to insert from Non Partitioned table to Partitioned table in Hive?
You can insert data from Non Partitioned table to Partitioned table , in short , if you want to have…
What is the difference between concat and concat_ws in Pyspark
concat vs concat_ws Syntax: pyspark.sql.functions.concat(*cols) pyspark.sql.functions.concat_ws(sep, *cols) concat : concat concatenates multiple input columns together into a single column. The…
How to add a new column in PySpark using withColumn
withColumn Syntax: DataFrame.withColumn(column_name, col) withColumn is comonly used to add a column on an existing dataframe. withColumn returns a new…
How to use filter or where condition in PySpark
filter / where The filter condition will filters rows based on multiple conditions. where() is an alias for filter(). In…
Explain Complex datatype PySpark (ArrayType,MapType,StructType)
There are three complex datatype in PySpark, (1) ArrayType, (2) MapType (3) StructType. ArrayType ArrayType represents values comprising a sequence…
How to create tables from Spark Dataframe and join the tables (createOrReplaceTempView)
createOrReplaceTempView There are many scenario in which you can do the transformation using sql instead of direct spark dataframe operations….
How to transform a JSON Column to multiple columns based on Key in PySpark
JSON Column to multiple columns Consider you have situation with incoming raw data got a json column, and you need…
How to parses a column containing a JSON string using PySpark(from_json)
from_json If you have JSON object in a column, and need to do any transformation you can use from_json. from_json…
How to get the common elements from two arrays in two columns in PySpark (array_intersect)
array_intersect When you want to get the common elements from two arrays in two columns in PySpark you can use…