How to convert Array elements to Rows in PySpark ? PySpark – Explode Example code.

PySpark @ Freshers.in

Function : pyspark.sql.functions.explode

To converts the Array of Array Columns to row in PySpark we use “explode” function. Explode returns a new row for each element in an array or map. Explodes function usage avoids the loops. Explode returns a new row for each element in the given array or map. Uses the default column name ‘col’ for elements in the array and ‘key’ and ‘value’ for elements in the map unless specified otherwise.

Syntax : explode(col)

Example:

from pyspark.sql.functions import explode
raw_data = [
("CBS",[["AL","CA","DE","KS"],["AB","BC","MB","NB"]]),
("Fox",[["TX","WY","WA","AZ"],["NT","NS","NU","ON"]]),
("MyNetworkTV",[["DE","KS","NE","PA"],["MB","NB","NL","NT"]]),
("NBC",[["PA","TX","WY","WA"],["NT","NS","NU","ON"]]),
("ESPN",[["WY","WA","AZ","CO","DC"],["ON","PE","QC","SK"]]),
("PBS",[["NE","PA","TX","WY"],["NL","NT","NS","NU"]]),
("UPN",[["PA","TX","WY","WA"],["BC","MB","NB","SK","YT"]]),
("MLB Network",[["CA","DE","KS","NE"],["ON","PE","QC","SK","YT"]])]
df = spark.createDataFrame(data=raw_data, schema = ['CHANNEL_NM','US_CANADA_STATES'])
df.show()
df.show(truncate=False)
df2 = df.select(df.CHANNEL_NM,explode(df.US_CANADA_STATES))
df2.show()

Reference

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions
pyspark_explode
pyspark explode

Spark official page

Author: user

Leave a Reply