How to get json object from a json string based on json path specified – get_json_object – PySpark

PySpark @ Freshers.in

get_json_object

get_json_object will extracts json object from a json string based on json path mentioned and this will and returns json string of the extracted json object.

Syntax

get_json_object(col, path)

col : This should be the string column which is in in json format.

path : This should be the path to the json object to extract.

Sample code

from pyspark.sql import SparkSession
from pyspark.sql.functions import get_json_object
data = [(1,'''{"country":"United States", "state":"Delaware", "state_capital":"Dover", "zip_quantity":98}'''),\
(2,'''{"country":"United States", "state":"Georgia", "state_capital":"Atlanta", "zip_quantity":973}'''),\
(3,'''{"country":"United States", "state":"Indiana", "state_capital":"Indianapolis", "zip_quantity":987}'''),\
(4,'''{"country":"United States", "state":"Maryland", "state_capital":"Annapolis", "zip_quantity":622 }'''),\
(5,'''{"country":"United States", "state":"Texas"}''')]
df = spark.createDataFrame(data,("si_no","state_info"))
df.show(20,False)
+-----+--------------------------------------------------------------------------------------------------+
|si_no|state_info                                                                                        |
+-----+--------------------------------------------------------------------------------------------------+
|1    |{"country":"United States", "state":"Delaware", "state_capital":"Dover", "zip_quantity":98}       |
|2    |{"country":"United States", "state":"Georgia", "state_capital":"Atlanta", "zip_quantity":973}     |
|3    |{"country":"United States", "state":"Indiana", "state_capital":"Indianapolis", "zip_quantity":987}|
|4    |{"country":"United States", "state":"Maryland", "state_capital":"Annapolis", "zip_quantity":622 } |
|5    |{"country":"United States", "state":"Texas"}                                                      |
+-----+--------------------------------------------------------------------------------------------------+

Using get_json_object , we are going to extract json object in each column

df2= df.select(df.si_no,get_json_object(df.state_info,"$.country").alias("country"),\
get_json_object(df.state_info,"$.state").alias("state"),\
get_json_object(df.state_info,"$.state_capital").alias("state_capital"),\
get_json_object(df.state_info,"$.zip_quantity").alias("zip_quantity")).show()
+-----+-------------+--------+-------------+------------+
|si_no|country      |state   |state_capital|zip_quantity|
+-----+-------------+--------+-------------+------------+
|1    |United States|Delaware|Dover        |98          |
|2    |United States|Georgia |Atlanta      |973         |
|3    |United States|Indiana |Indianapolis |987         |
|4    |United States|Maryland|Annapolis    |622         |
|5    |United States|Texas   |null         |null        |
+-----+-------------+--------+-------------+------------+
Author: user

Leave a Reply