Function : pyspark.sql.functions.explode
To converts the Array of Array Columns to row in PySpark we use “explode” function. Explode returns a new row for each element in an array or map. Explodes function usage avoids the loops. Explode returns a new row for each element in the given array or map. Uses the default column name ‘col’ for elements in the array and ‘key’ and ‘value’ for elements in the map unless specified otherwise.
Syntax : explode(col)
Example:
from pyspark.sql.functions import explode raw_data = [ ("CBS",[["AL","CA","DE","KS"],["AB","BC","MB","NB"]]), ("Fox",[["TX","WY","WA","AZ"],["NT","NS","NU","ON"]]), ("MyNetworkTV",[["DE","KS","NE","PA"],["MB","NB","NL","NT"]]), ("NBC",[["PA","TX","WY","WA"],["NT","NS","NU","ON"]]), ("ESPN",[["WY","WA","AZ","CO","DC"],["ON","PE","QC","SK"]]), ("PBS",[["NE","PA","TX","WY"],["NL","NT","NS","NU"]]), ("UPN",[["PA","TX","WY","WA"],["BC","MB","NB","SK","YT"]]), ("MLB Network",[["CA","DE","KS","NE"],["ON","PE","QC","SK","YT"]])] df = spark.createDataFrame(data=raw_data, schema = ['CHANNEL_NM','US_CANADA_STATES']) df.show() df.show(truncate=False) df2 = df.select(df.CHANNEL_NM,explode(df.US_CANADA_STATES)) df2.show()
Reference