get_json_object
get_json_object will extracts json object from a json string based on json path mentioned and this will and returns json string of the extracted json object.
Syntax
get_json_object(col, path)
col : This should be the string column which is in in json format.
path : This should be the path to the json object to extract.
Sample code
from pyspark.sql import SparkSession
from pyspark.sql.functions import get_json_object
data = [(1,'''{"country":"United States", "state":"Delaware", "state_capital":"Dover", "zip_quantity":98}'''),\
(2,'''{"country":"United States", "state":"Georgia", "state_capital":"Atlanta", "zip_quantity":973}'''),\
(3,'''{"country":"United States", "state":"Indiana", "state_capital":"Indianapolis", "zip_quantity":987}'''),\
(4,'''{"country":"United States", "state":"Maryland", "state_capital":"Annapolis", "zip_quantity":622 }'''),\
(5,'''{"country":"United States", "state":"Texas"}''')]
df = spark.createDataFrame(data,("si_no","state_info"))
df.show(20,False)
+-----+--------------------------------------------------------------------------------------------------+
|si_no|state_info |
+-----+--------------------------------------------------------------------------------------------------+
|1 |{"country":"United States", "state":"Delaware", "state_capital":"Dover", "zip_quantity":98} |
|2 |{"country":"United States", "state":"Georgia", "state_capital":"Atlanta", "zip_quantity":973} |
|3 |{"country":"United States", "state":"Indiana", "state_capital":"Indianapolis", "zip_quantity":987}|
|4 |{"country":"United States", "state":"Maryland", "state_capital":"Annapolis", "zip_quantity":622 } |
|5 |{"country":"United States", "state":"Texas"} |
+-----+--------------------------------------------------------------------------------------------------+
Using get_json_object , we are going to extract json object in each column
df2= df.select(df.si_no,get_json_object(df.state_info,"$.country").alias("country"),\
get_json_object(df.state_info,"$.state").alias("state"),\
get_json_object(df.state_info,"$.state_capital").alias("state_capital"),\
get_json_object(df.state_info,"$.zip_quantity").alias("zip_quantity")).show()
+-----+-------------+--------+-------------+------------+
|si_no|country |state |state_capital|zip_quantity|
+-----+-------------+--------+-------------+------------+
|1 |United States|Delaware|Dover |98 |
|2 |United States|Georgia |Atlanta |973 |
|3 |United States|Indiana |Indianapolis |987 |
|4 |United States|Maryland|Annapolis |622 |
|5 |United States|Texas |null |null |
+-----+-------------+--------+-------------+------------+