Loading JSON schema from a JSON string in PySpark

user October 12, 2023

We want to load the JSON schema from a JSON string. In PySpark, you can do this by parsing the JSON string and creating a schema from it. Here’s a revised article that demonstrates how to load a JSON schema from a JSON string:

Loading JSON Schema from a JSON string in PySpark

In PySpark, you can load a JSON schema from a JSON string, allowing you to dynamically define the schema for your data. This can be useful when your data structure evolves or when you want to provide flexibility in handling different JSON structures.

1. Importing PySpark

First, make sure you have PySpark installed. You can install it using pip:

pip install pyspark

Import the necessary modules:

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType
import json

2. Creating a SparkSession

Create a SparkSession, the entry point for using PySpark:

spark = SparkSession.builder.appName("JSONSchema from JSONString at Freshers.in").getOrCreate()

3. Defining the JSON schema

Define your JSON schema by parsing a JSON string. This JSON string represents the schema structure. Here’s an example JSON schema string:

schema_json_string = """
{
    "type": "struct",
    "fields": [
        {"name": "id", "type": "integer", "nullable": true, "metadata": {}},
        {"name": "first_name", "type": "string", "nullable": true, "metadata": {}},
        {"name": "last_name", "type": "string", "nullable": true, "metadata": {}},
        {"name": "age", "type": "integer", "nullable": true, "metadata": {}},
        {"name": "salary", "type": "double", "nullable": true, "metadata": {}}
    ],
    "metadata": {}
}
"""

4. Creating a StructType schema

Parse the JSON schema string and create a StructType schema object:

schema_dict = json.loads(schema_json_string)
schema = StructType.fromJson(schema_dict)

5. Loading JSON Data with the schema

Now, you can load JSON data using the defined schema:

json_data = [
    {"id": 1, "first_name": "Sachin", "last_name": "Tendulkar", "age": 30, "salary": 50000.0},
    {"id": 2, "first_name": "Rajesh", "last_name": "Kanna", "age": 25, "salary": 60000.0},
    {"id": 3, "first_name": "Mahesh", "last_name": "Raj", "age": 35, "salary": 75000.0}
]

df = spark.createDataFrame(json_data, schema=schema)

6. Viewing the dataframe

You can now perform various operations on the DataFrame, such as displaying the schema or showing the first few rows of data:

df.printSchema()
df.show()

Output

root
 |-- id: integer (nullable = true)
 |-- first_name: string (nullable = true)
 |-- last_name: string (nullable = true)
 |-- age: integer (nullable = true)
 |-- salary: double (nullable = true)

+---+----------+---------+---+-------+
| id|first_name|last_name|age| salary|
+---+----------+---------+---+-------+
|  1|    Sachin|Tendulkar| 30|50000.0|
|  2|    Rajesh|    Kanna| 25|60000.0|
|  3|    Mahesh|      Raj| 35|75000.0|
+---+----------+---------+---+-------+

Spark important urls to refer

Post Views: 24

Author: user

Loading JSON schema from a JSON string in PySpark

Loading JSON Schema from a JSON string in PySpark

1. Importing PySpark

2. Creating a SparkSession

3. Defining the JSON schema

4. Creating a StructType schema

5. Loading JSON Data with the schema

6. Viewing the dataframe

Trending

Recent Posts

Featured Posts – Slider Widget

Electronics and Instrumentation

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Troubleshooting Data Ingestion and Processing Issues with AWS Kinesis Streams

Impact of Shard Count Modification on AWS Kinesis Streams

How to map values of a Series according to an input correspondence:SSeries.map()

Understanding Series.transform(func[, axis])

Series.aggregate(func) : Pandas API on Spark

Series.agg(func) : Pandas API on Spark

Most Viewed Posts

Loading JSON Schema from a JSON string in PySpark

1. Importing PySpark

2. Creating a SparkSession

3. Defining the JSON schema

4. Creating a StructType schema

5. Loading JSON Data with the schema

6. Viewing the dataframe

Related Articles

Trending

Recent Posts

Featured Posts – Slider Widget