Hive : How to load JSON and nested JSON in Hive and how to view it [Sample code with Data]

In this article, I’ll walk you through how to read JSON data from a Hive table using an example with multiple JSON objects as input. We’ll cover the following steps:

Creating a Hive table to store the JSON data
Loading JSON data into the Hive table
Querying the Hive table to retrieve the JSON data
Reading the JSON data using Apache Spark
Loading NESTED JSON data into the Hive table

Step 1: Creating a Hive table to store the JSON data

To create a Hive table to store the JSON data, you’ll need to define the table schema. The schema should match the structure of the JSON data you’ll be loading into the table.

Here’s an example schema for a Hive table that can store JSON data with the following structure:

{ "id": "1", 
"name": "Sachin", 
"age": 30, 
"email": "sachin@example.com" 
}

CREATE EXTERNAL TABLE test.freshers_json_table 
( 	id string, 
	name string, 
	age int, 
	email string ) 
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' 
STORED AS TEXTFILE LOCATION 's3://freshers-in/training/hive/json_sample/'

This will create a Hive table called test.freshers_json_table with columns id, name, age, and email.

Note the ROW FORMAT SERDE and STORED AS clauses. These specify the format of the data stored in the table. In this case, we’re using the JsonSerDe serde, which can handle JSON data.

Step 2: Loading JSON data into the Hive table

To load JSON data into the Hive table, you’ll need to use the LOAD DATA statement.

Assuming you have a file containing multiple JSON objects like this:

{
“id”: “1”,
“name”: “Sachin”,
“age”: 30,
“email”: “sachin@example.com”
}

{
“id”: “2”,
“name”: “Tony Dane”,
“age”: 25,
“email”: “tony.dane@example.com”
}

Actual JSON in file looks like

{"id": "1",  "name": "Sachin",  "age": 30,  "email": "sachin@example.com"}
{"id": "2",  "name": "Tony",  "age": 25,  "email": "tony@example.com"}

In the create statement LOCATION ‘s3://freshers-in/training/hive/json_sample/’ will load the data from this location.

Step 3: Querying the Hive table to retrieve the JSON data

To retrieve the JSON data from the Hive table, you can use a simple SELECT statement:

select * from test.freshers_json_table

Step 4: Reading the JSON data using Apache Spark

If you want to analyze the JSON data using Apache Spark, you can read it into a Spark DataFrame using the spark.read.json() method.

val spark = SparkSession.builder().appName("ReadJSONDataFromHiveTable").master("local[*]").getOrCreate()

val myJsonTableDF = spark.read.json("<code class="language-sql">test.freshers_json_table") myJsonTableDF.show()

Step 5: Loading Nested JSON data into the hive tables

Parsing multiple nested JSON data in Hive can be a bit more complex, but it can be done using Hive’s get_json_object function in conjunction with the json_tuple function.

Let’s say we have a nested JSON data structure like this:

Sample JSON

{
“id”: “1”,
“name”: “sachin”,
“age”: 30,
“email”: “sachin@example.com”,
“addresses”: [
{
“street”: “123MainSt”,
“city”: “Maryland”,
“state”: “MD”,
“zipcode”: “21210”
},
{
“street”: “456ElmSt”,
“city”: “Niche ville”,
“state”: “NY”,
“zipcode”: “67890”
}
]
}

In the file you need json in single file format. Each JSON object in each line. The above is just for your understanding. In file it should be like

{ "id": "1", "name": "sachin", "age": 30, "email": "sachin@example.com", "addresses": [ { "street": "123MainSt", "city": "Maryland", "state": "MD", "zipcode": "21210" }, { "street": "456ElmSt", "city": "Niche ville", "state": "NY", "zipcode": "67890" } ] }

CREATE TABLE statement for a Hive table that stores above JSON data.

CREATE EXTERNAL TABLE test.freshers_json_table2
( 	id string, 
	name string, 
	age int, 
	email string, 
	addresses array<struct<street:string, city:string, state:string, zipcode:string>> ) 
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' 
STORED AS TEXTFILE 
LOCATION 's3://freshers-in/training/hive/json_nested_sample/';

In this statement, we’re defining a table with columns for id, name, age, email, and address. The address column is defined as a struct with fields for street, city, state, and zipcode. We’re using the JsonSerDe serde to serialize and deserialize JSON data. The TEXTFILE format is used to store the data in plain text files. If you prefer to use a different file format, you can specify a different storage format.Note that when inserting data into this table, you’ll need to provide the JSON data as a string. For example:

In this statement, we’re defining a table with columns for id, name, age, email, and addresses. The addresses column is defined as an array of structs with fields for street, city, state, and zipcode.

We’re using the JsonSerDe serde to serialize and deserialize JSON data. The TEXTFILE format is used to store the data in plain text files. If you prefer to use a different file format, you can specify a different storage format.

Note that the addresses column is defined as an array. This is because your JSON data contains multiple addresses for each person.

To browse the data you can have the select table as below.

SELECT
  id,
  name,
  age,
  email,
  get_json_object(json_data, '$.addresses.street') as street,
  get_json_object(json_data, '$.addresses.city') as city,
  get_json_object(json_data, '$.addresses.state') as state,
  get_json_object(json_data, '$.addresses.zipcode') as zipcode
FROM test.my_json_table2;

SELECT
  id,
  name,
  age,
  email,
  address.street,
  address.city
FROM
  test.freshers_json_table2
LATERAL VIEW
  explode(addresses) exploded_table AS address;

In you want to insert into hive you can do as follows. But in this case Hive table should not be an EXTERNAL table.

INSERT INTO my_json_table VALUES (
  '1',
  'Mike Walle',
  30,
  'mike.walle@example.com',
  '[{"street":"John Herny MainSt","city":"Alabama","state":"CA","zipcode":"65252"},{"street":"456ElmSt","city":"Jachsonville","state":"OH","zipcode":"54854"}]'
);

Hive important pages to refer

Post Views: 374

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Most Viewed Posts