How to convert Array elements to Rows in PySpark ? PySpark – Explode Example code.

user January 29, 2022 Leave a Comment on How to convert Array elements to Rows in PySpark ? PySpark – Explode Example code.

Function : pyspark.sql.functions.explode

To converts the Array of Array Columns to row in PySpark we use “explode” function. Explode returns a new row for each element in an array or map. Explodes function usage avoids the loops. Explode returns a new row for each element in the given array or map. Uses the default column name ‘col’ for elements in the array and ‘key’ and ‘value’ for elements in the map unless specified otherwise.

Syntax : explode(col)

Example:

from pyspark.sql.functions import explode
raw_data = [
("CBS",[["AL","CA","DE","KS"],["AB","BC","MB","NB"]]),
("Fox",[["TX","WY","WA","AZ"],["NT","NS","NU","ON"]]),
("MyNetworkTV",[["DE","KS","NE","PA"],["MB","NB","NL","NT"]]),
("NBC",[["PA","TX","WY","WA"],["NT","NS","NU","ON"]]),
("ESPN",[["WY","WA","AZ","CO","DC"],["ON","PE","QC","SK"]]),
("PBS",[["NE","PA","TX","WY"],["NL","NT","NS","NU"]]),
("UPN",[["PA","TX","WY","WA"],["BC","MB","NB","SK","YT"]]),
("MLB Network",[["CA","DE","KS","NE"],["ON","PE","QC","SK","YT"]])]
df = spark.createDataFrame(data=raw_data, schema = ['CHANNEL_NM','US_CANADA_STATES'])
df.show()
df.show(truncate=False)
df2 = df.select(df.CHANNEL_NM,explode(df.US_CANADA_STATES))
df2.show()

Reference

Spark Examples
PySpark Blogs
Bigdata Blogs
Spark Interview Questions

Spark official page

Post Views: 116

PySpark : Creating multiple rows for each element in the array[explode]
pyspark.sql.functions.explode One of the important operations in PySpark is the explode function, which is used…
PySpark : Sort an array of elements in a DataFrame column
pyspark.sql.functions.array_sort The array_sort function is a PySpark function that allows you to sort an array…
PySpark : Concatenatinating elements of an array into a single string.
pyspark.sql.functions.array_join PySpark's array_join function is used to concatenate elements of an array into a single…
How to get the common elements from two arrays in two columns in PySpark (array_intersect)
array_intersect When you want to get the common elements from two arrays in two columns…
How to find array contains a given value or values using PySpark ( PySpark search in array)
array_contains You can find specific value/values in an array using spark sql function array_contains. array_contains(array,…
How to removes duplicate values from array in PySpark
This blog will show you , how to remove the duplicates in an column with…
How to find difference between two arrays in PySpark(array_except)
array_except In PySpark , array_except will returns an array of the elements in one column…
How to create an array containing a column repeated count times - PySpark
For repeating array elements k times in PySpark we can use the below library. Library…
PySpark : Transforming a column of arrays or maps into multiple rows : Converting rows into columns
pyspark.sql.functions.explode_outer In PySpark, the explode() function is used to transform a column of arrays or…
PySpark : Find the maximum value in an array column of a DataFrame
pyspark.sql.functions.array_max The array_max function is a built-in function in Pyspark that finds the maximum value…